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What is the Worth of Free Casino 
Credit? 


Michael Orkin and Richard Kakigi 


THE ZARIN CASE—In 1980, a compulsive gambler named David Zarin used a 
generous credit line to run up a huge debt playing craps in an Atlantic City casino. 
When the casino finally cut off Zarin’s credit, he owed over $3 million. Due in part 
to New Jersey’s laws protecting compulsive gamblers, the debt was deemed 
unenforceable by the courts, leading the casino to settle with Zarin for a small 
fraction of the amount he owed. Later, the Internal Revenue Service tried to 
collect taxes on the approximately $3 million Zarin didn’t repay, claiming that 
cancellation of the debt made it taxable income. Since Zarin had never actually 
received any cash (he was always given chips, which he promptly lost at the craps 
table), an appellate court finally ruled that Zarin had no tax obligation [6]. The 
courts never asked what Zarin’s credit line was actually worth. Surely, it was worth 
something. With $3 million dollars in chips to play with, there is a chance, albeit 
small, that a gambler will make a profit and leave the casino with cash in his 
pocket. We will show that, viewed as a gambler’s ruin problem making pass line 
bets at craps, the ‘‘worth” of a sufficiently large free line of credit is approximately 
$197,000. 


FREE CREDIT—Suppose a casino gives you $3 million in chips for gambling, but 
to redeem chips for cash you must first pay back the $3 million. You incur no debt 
if you lose. You restrict your betting to the pass line bet in craps. What is your 
optimal betting strategy and how much profit will it yield on the average? 

To answer this question, we must define “optimal.” According to the laws of 
probability, in repeated play of a game in which the house has an edge, you will 
eventually end up broke no matter what strategy you use. We have to settle for 
something more modest than guaranteed success. 


BOLD PLAY—Consider the following approach. You have a fixed monetary goal 
and will keep betting until you either reach your goal, in which case you quit a 
winner, or go broke. Nothing else matters. For you, a strategy is optimal if it 
maximizes the chance that you reach your goal. Dubins and Savage showed that 
when the house has an edge, bold play is optimal. Bold play means that you bet 
your entire bankroll on each bet, or as much as is necessary to reach your goal. For 
example, suppose you have a bankroll of $9,000 and your goal is $20,000. On your 
first bet, bet the entire $9,000. If you lose, you are broke, and the game is over. If 
you win, your bankroll has increased to $18,000. Since you are now only $2,000 
short of your goal, bet $2,000. If you win, you have reached your goal. If you lose, 
your bankroll is $16,000. Bet $4,000, just enough to reach your goal. And so on. 
Caution: Bold play doesn’t guarantee a profit or shift the edge to your favor. It 
only maximizes the probability that you will reach your goal before going broke. 
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If you are a bold player and a big bettor, you will have a problem if the 
necessary bet exceeds the house limit. For example, suppose you have $10,000, 
your goal is $25,000, and the maximum bet is $5,000. Bold play says bet $10,000, 
but because of the betting limit, you are forced to bet $5,000. When you always bet 
the same amount, the chance of reaching your goal can be computed with the 
Gambler’s Ruin formula. 


GAMBLER’S RUIN—Suppose you bet repeatedly, always betting the same amount, 
which we shall call a “betting unit,’ and suppose you keep playing until you either 
reach your goal or go broke. Then, assuming 1 to 1 payoff odds, the chance that 
you reach your goal is computed as follows. 

Let 


p = your chance of winning one bet 
q = 1 — p = your chance of losing one bet 
i = your current fortune, expressed in betting units 
N = your goal, expressed in betting units (N > i). 
Then if PG, N) = Prob(you reach N before going broke) we have (Feller, page 
344) 


q 
Pp . 1 
P(i,N) = 7\¥ ifp #35; 
D 
L 
~ NV if p = 5. 


GAMBLER’S RUIN AND THE PASS LINE BET—The pass line bet in craps is 
almost a fair game. The approximate chance of winning is .493 (the exact chance is 
244/495). The approximate chance of losing is .507. Payoff odds are 1 to 1. 
According to the law of averages, in repeated play, a gambler will win 49.3% and 
lose 50.7% of his bets for a net loss of 1.4 cents per dollar bet. Suppose that you 
currently have $10,000, your goal is $25,000, and you always bet $5,000 on the pass 
line. Then using the approximate probabilities 

p = .493 

q = .507 

i = 2 (2 betting units of $5,000) 

N = 5 (5 betting units) 


and 


P(you reach goal before going broke) = 


The chance that you reach the goal of $50,000 (10 betting units) before going 
broke is .18. The chance that you reach the goal of $100,000, is .08. The chance of 
turning $10,000 into $1 million by repeatedly making $5,000 pass line bets is .0002. 
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The bigger your goal, the less your chance of reaching it. In fact, for any bet where 
the chance of losing is greater than the chance of winning (q/p > 1), as the goal, 
N, gets larger, the chance of reaching it gets smaller, eventually approaching 0. 
When the house has an edge, repeated play grinds down the gambler’s bankroll. 


EXPECTED PROFIT—What are your average winnings with free casino credit 
when you always make maximum pass line bets of $15,000? Say you start with i 
units credit and keep betting until you either reach N units or go broke (N > i). 
You will end up with either a profit of N — i or 0 (no debt). Denoting Expected 
Profit by E(i, n), we get 
(5 
p 


N 
q 
(5 
p 
This amount varies according to the goal, N. Naturally, we want to find the best 


N, that is, the N that maximizes E(i, N). Starting with $3 million credit, always 
making maximum $15,000 pass line bets, we have i = 200 units, p = .493, gq = .507 


and 
507 \7° 
1 — —_—___ 
(“iss | 

; 507\% - 

493 
This expression reaches a maximum when N = 236 (36 unit profit), in which case, 
Expected Profit = $197,000 (approximately). The optimal strategy, then, is to keep 
making maximum bets until you either reach a 36 unit ($540,000) profit and quit, 
or go broke. Using this strategy, a $3 million free credit is ““worth’” $197,000. (It is 


not clear how the judge would have used this information, since tax liability is 
based upon actual rather than expected gains and losses.) 


E(i,N) =(N-i)P(i,N) =(N-i) 


E(200, N) = (N — 200) x P(200, N) = (N — 200) x 


THERE IS A LIMIT—It turns out that there is a limit to what free credit can do. 
After a while, increasing the credit line doesn’t add to Expected Profit. In fact, the 
Expected Profit of $197,000 is the most a craps-playing gambler can hope for, no 
matter what the credit line. If you happen to get $1 billion free credit, or even $100 
billion, you should still play for a 36 unit profit, and your Expected Profit is still 
$197,000. This interesting fact is obtained as follows. 

Let N > » and i > © with D = N — 1 fixed, and let r= q/p > 1. Then 


. 1-r' _p 
P(i,N) = popN Oe 


and E(Gi, N) — Dr~?. The maximum of Dr~? occurs when 


1 
2 Tar)” 
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Thus, for large i, the target increment that maximizes expected payoff is approxi- 
mately 1/In(r). Using this increment for each 7, and letting i — ~, we have 


1 —1 
iis aS} > (eine) 3 


where e is the base for natural logarithms (e = approx .37). If one unit = $C, then 
for large i, the worth of i units of credit is approximately $C /(e In(r)). 

For pass line bets (using the approximate probability p = .493), r= 
.507/.493 = 1.028, so In(r) = .028 and D = 1/.028 = 36. Thus, for large i, the 
optimal goal is a 36 unit profit and your chance of reaching it before going broke is 
1/e = .37 (In fact, for large i, your chance of reaching the optimal goal before 
going broke is 1/e, REGARDLESS of the game you are playing, as long as 
q > p). In this case, Expected Profit = 1/e(0.028) = 13.14 units. When the betting 
unit = $15,000, Expected Profit = $197,000. 


Pl\i,it 


| > pr i/In) = e-! and E 


1 
In(r) 


FAVORABLE GAMES—Bold play is fine if the house has an edge, but what if 
YOU have an edge? This doesn’t happen in most casino games, but it is generally 
agreed that blackjack, sports betting, and horse racing can, at least theoretically, 
give a clever player an edge. In this case, the law of averages guarantees that the 
gambler, not the casino, will be a winner in repeated play, provided that reckless 
betting doesn’t cause early bankruptcy. When the gambler has a huge bankroll, 
even maximum allowable bets are prudent, and there is little chance of going 
broke. In this case, the gambler is almost assured of whatever goal he wants to 
reach, subject to various constraints. Consider blackjack. 


BLACKJACK—Skilled blackjack players will win about 51% of their bets under 
ideal playing conditions (It is unclear how often such conditions exist, especially 
for big bettors). Suppose the maximum bet is $15,000, same as craps, and the 
gambler, having a $3 million bankroll, makes $15,000 blackjack bets with win 
probability = .51. Applying the gambler’s ruin formula, we see a stark difference 
between blackjack and craps. 

In craps (or any bets where your chance of losing is greater than your chance of 
winning), your chance of reaching your goal before going broke decreases to 0 as 
your goal increases. In blackjack, with p = .51, (or any bets where your chance of 
winning is greater than your chance of losing), your chance of reaching your goal 
before going broke decreases as your goal increases, but not to 0. Instead, starting 
with i betting units, your chance of reaching your goal never goes below 1 — (q/p)’. 
With p = .51, gq = .49, and starting with 7 = 200 betting units, your chance of 
reaching any goal before going broke is always greater than .999. This follows 
directly from the gambler’s ruin formula. With a 200 betting unit bankroll, betting 
a unit at a time, you’re almost certain to reach your goal, no matter how large, 
provided that you have enough time to play Gf you set your sights unrealistically 
high, you may die of old age before you reach your goal). 


WHAT A DIFFERENCE AN EDGE MAKES—As an example of how the slight 
difference in pass line and blackjack win probabilities is magnified over repeated 
play, suppose you start with i = 200 betting units and have a goal of N = 500 
units. Making pass line bets, with p = .493, your chance of reaching 500 units 
before going broke is .00023. Making blackjack bets, with p = .51, your chance of 
reaching 500 units is .99966. 
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RECKLESS BANKRUPTCY—If your chance of winning is greater than your 
chance of losing, you can almost certainly reach your goal with prudent play. 
However, if your betting unit is recklessly large relative to your bankroll, you may 
suddenly go broke. For example, suppose you are playing blackjack and you start 
by betting your entire bankroll. With win probability = .51, you have a .49 chance 
of going broke on the first bet. If there is no betting limit, and you always bet your 
entire bankroll, you are almost certain to go broke at some point. Thus, you 
shouldn’t bet recklessly. 


THE KELLY SYSTEM —The Kelly system, named after the mathematician J. L. 
Kelly who invented it, specifies how much to bet if you have an edge, as in the case 
of an expert blackjack player. With the Kelly system, you always bet a fixed 
fraction of your bankroll. The more you have, the more you bet, the less you have, 
the less you bet. Leo Breiman showed that the Kelly system is optimal for 
favorable games (you have an edge) in two important ways: First, it does better in 
the long run than any substantially different strategy. Second, the expected number 
of bets necessary to reach a specified goal with the Kelly system is less than with 
any other strategy. If the house has an edge, as with the pass line bet, the Kelly 
system says ‘“‘don’t bet.” 

To use the Kelly system, you must know your chance of winning and the payoff 
odds. Mathematically speaking, you bet the fraction of your bankroll that maxi- 
mizes “growth rate.’ Growth rate is the expected logarithm of your return, where 
return is payoff per dollar bet. 

If the payoff odds are 1 to 1 and p is your chance of winning, you should always 
bet the fraction 2 X p — 1 of your bankroll. For example, with blackjack win 
probability = .51, the Kelly system says bet the fraction 2 X .51 — 1 = .02 of your 
bankroll. If your bankroll is $500, bet $10. If your bankroll is $5,000, bet $100. If 
you have $3 million, bet $60,000. If you have $3 million and the betting limit is 
$15,000, bet the limit. (Note: This is a simplification. Good blackjack play requires 
card counting, with bet increases when the deck favors the player). 


HOURLY WAGE—For any betting strategy, it may be useful to compute the ratio 
of the expected profit to the expected duration of play. This will give an estimate 
of your expected hourly wage. To do this, you need to know the average number of 
bets in a betting sequence. In the gambler’s ruin sequence, the average number of 
bets it takes to either reach your goal or go broke is given by the formula (Feller, 


page 348) 
| N | 1-(4a/p)' 
B= ——— — | —— ]| -—7y J: 
(q —-P) (4—P)}\1-(a/p) 

For example, if you are playing blackjack, with p = .51 and q = .49, starting 
with the usual i = 200 betting units, and having the goal of N = 500 units, then 
B = 14,992. If you make an average of 1 bet per minute, it will take about 250 
hours, or about one month of 8 hour day, 6 day weeks to either reach your goal or 
go broke. Suppose, like Zarin, you have $3 million, and, unlike Zarin, you make 
$15,000 blackjack bets. Then, your Expected Profit is about $4.5 million, whether 
or not you have free casino credit. Dividing by 250 hours yields an hourly wage of 


$18,000. Not bad. Unfortunately, most of us don’t have a $3 million gambling 
bankroll (or the ability to win a casino game 51% of the time). If you start with 
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$3,000 instead of $3 million and make $15 blackjack bets instead of $15,000 bets, 
your average hourly wage will be $18 instead of $18,000. 

Let’s go back to Zarin and his $3 million, expressed as 200 pass line betting 
units of $15,000. It will take an average of 8149 bets, or about 136 hours, at one bet 
per minute, to either reach the 236 unit goal or go broke. We saw that with free 
credit, Zarin’s Expected Profit from using this strategy was $197,000. His expected 
hourly wage would have been $197,000 /136 = $1449. 


- WHAT DOES BANKROLL REALLY MEAN?—Say you bring $1,000 to the casino. 
Is that your bankroll? Suppose you have another $10,000 in the bank. Should that 
be counted? You can sell your car. Is that part of your bankroll? Maybe you can 
also sell your house and milk your credit cards to the limit. There are many 
possibilities, but if you bloat your bankroll with money not meant for gambling, 
losing can be painful. 


BOTTOM LINE—Zarin should have learned blackjack. If he had mastered a 
strategy giving him a slight edge (such as win probability = .51) and if he had been 
allowed to make large bets under ideal playing conditions, his $3 million credit 
would have allowed him to generate considerable profit. Unfortunately, ideal 
playing conditions are hard to come by for big-betting blackjack players. 


ACKNOWLEDGMENT—The authors would like to thank Richard Beck of the New York Law School 
for informing them of the Zarin case. 
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An Envy-Free Cake Division Protocol 


Steven J. Brams and Alan D. Taylor 


Our starting point is the well-known parental solution to the problem of dividing a 
cake between two children so that each child thinks he or she has been treated 
fairly: The parent instructs one of the children to cut the cake into two pieces in 
any way he desires. The other child is then instructed to choose whichever piece 
she prefers. This two-step sequence of instructions, known as “cut-and-choose,” 
provides a simple example of the kind of game-theoretic algorithm that Even and 
Paz [11] call a “protocol.” 

Associated with the cut-and-choose protocol is a natural strategy for each child: 
The first child cuts the cake into two pieces that he considers to be equal, and the 
second child chooses a piece that she considers to be at least as large as the other 
piece. Notice that each child’s strategy guarantees him or her “satisfaction,” 
regardless of what the other child does. 

The general version of this problem involves n people (‘players’), each of 
whom has his or her preferences over subsets of the cake given by a probability 
measure.! An allocation of the cake among the players is said to be proportional if 
each player receives a piece of size at least 1/n (in his or her own measure), and it 
is said to be envy-free if each player receives a piece he or she would not swap for 
that received by any other player. It is easy to see that an envy-free allocation is 
proportional, but the converse fails unless n = 2. Thus, for example, every one of 
three players may think his or her piece is at least 1/3, but a player may think that 
one of the other players has a larger piece. 

The results on proportional and envy-free allocations obtained over the past 50 
years tend to fall into one of four classes: (i) Existence Theorems; (ii) Moving-Knife 
Solutions; (iii) Algorithms; and (iv) Protocols. We say something about each in 
turn. 

Existence theorems, dating back to the 1940s, are often based on some version 
of Liapounoff’s Convexity Theorem [20]. Typically, they establish the existence of 
an ordered partition of the cake corresponding to an envy-free allocation, often 
with some additional property such as: all the measures of all the pieces are exactly 
1/n [21]; or the pieces are connected sets [27 and 31]; or the allocation is also 
Pareto-optimal [30 and 4]. In the words of Rebman [24, p. 33], however, these 
results provide ‘“‘no clue as to how to accomplish such a wonderful partition.” 

There are two well-known moving-knife solutions. The first is due to Dubins 
and Spanier [10] and is a moving-knife version of the Banach-Knaster last- 


‘If one wants to abandon the cake metaphor, and literally work with arbitrary probability measures 
on some set C, then even cut-and-choose can fail. For example, if both children have their preferences 
given by the same 0-1 valued measure, i.e., the same ultrafilter, then both will want the same piece 
regardless of how the cake is divided. 

For everything we do in the present paper, it suffices to assume that our measures are all defined on 
the same algebra .Y% of subsets of C and satisfy the following two properties: (i) for every set P € 
and every finite k, P can be partitioned into k sets of equal measure, and (ii) for every P,Q € SX, 
either P can be trimmed to yield a subset the same size as Q, or vice-versa. 
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diminisher scheme for n people that we shall present later. (The Dubins-Spanier 
scheme is easy to describe: a knife is slowly moved along the top of the cake so 
that all the slices made are parallel. Each player calls “cut” when he or she is 
willing to take the resulting piece as his or her allocation.) The second is a 
moving-knife scheme due to Stromquist [27] that yields an envy-free allocation 
among three people. (This one is not so easy to describe, because envy-freeness is 
considerably harder to obtain than proportionality.) 

Moving-knife schemes, however, are not the step-by-step processes one usually 
associates with the term “algorithm.” A good example of what one would call an 
algorithm in this context is Woodall’s scheme [32] for producing an allocation 
whereby each participant gets strictly more than 1/n of the cake (according to his 
own measure). This algorithm requires, as part of the input, a piece P of cake and 
two distinct numbers a@ and B such that Player 1 thinks the measure of P is a, and 
Player 2 thinks the measure of P is B. An envy-free version of this algorithm is in 
[5]. 

Finally, there is what Even and Paz [11] call a “protocol,” and this is the only 
kind of result we are going to analyze in the present paper. Since we will be 
presenting examples of protocols, as opposed to proving their non-existence, we 
can afford the same level of informality in the description of what is meant by a 
protocol as Even and Paz used. | 

A protocol is a computer-programmable interactive procedure that can issue 
queries to the participants whose answers may affect future decisions. It may issue 
instructions to the participants such as: “Choose k pieces from among these m 
pieces” or “Partition this piece into k subpieces.” The protocol has no information 
on the measure of the various pieces as seen by the different participants—this 1s 
private information. Moreover, if the participants obey the protocol, then each 
participant will end up with a piece after finitely many steps. 

Still following [11], we define a strategy for a participant to be an adaptive 
sequence of moves consistent with the protocol, which the participants choose 
sequentially when called upon by the protocol. A protocol is proportional if each 
of the 7 participants has a strategy that will guarantee him at least 1/n of the cake 
(by his own measure), independently of the other participants’ strategies. (Purely 
for convenience, we will henceforth use only the masculine pronoun.) Departing 
from [11], we will call a protocol envy-free if each of the n participants has a 
strategy that will guarantee him a piece that is, according to his own measure, at 
least tied for largest. 

A constructive proof of the existence of, say, a proportional protocol involves 
producing three separate things: the rules of the protocol, a strategy for each of 
the players, and an argument that the strategies do, in fact, guarantee each player 
his proportional share. We distinguish rules and strategies by demanding that rules 
be enforceable by a referee implementing the protocol. 

This‘means that a statement like “Player 1 cuts the cake into n pieces” is an 
acceptable rule, whereas a statement like “Player 1 cuts the cake into n pieces that 
he considers to be equal” is not. This is because the latter statement cannot be 
enforced by the referee, who has no knowledge of Player 1’s measure and so 
cannot tell if the rule has been followed or not. 

In presenting protocols, we will separate rules from strategies by placing all 
strategic aspects in parentheses. This provides one with the option of reading the 
rules alone in a reasonably smooth way. All arguments that the strategies perform 
as advertised are placed between steps and labeled as “Aside.” For example, in 
our method of presentation, cut-and-choose becomes: 
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Cut-and-Choose 


Step 1. Player 1 cuts the cake into 2 pieces (that he considers to be the same 
size). 

Step 2. Player 2 chooses a piece (that she considers to be at least tied for 
largest). 

Aside. Clearly, Player 1’s strategy guarantees him a piece of size exactly 1/2 
in his measure, while Player 2’s strategy guarantees her a piece of size 
at least 1/2 in her measure. 


The modern era of cake cutting began with Steinhaus’ observation “during the 
war [World War II] [25, p. 102] that the cut-and-choose protocol could be 
extended to yield a proportional protocol for three players (see [18]). He then 
asked if it could be extended to yield a proportional protocol for the case n > 3. 
(Steinhaus, however, never used the word “protocol.’’) His question was answered 
in the affirmative by Banach and Knaster and reported in [25] and [26]. The 
Steinhaus and Banach—Knaster protocols introduced two key ideas that would 
resurface in the envy-free solutions 15 and 50 years later. 

The first idea was that of having an initial sequence of steps resulting in only 
part of the cake’s being allocated (to one player in this case). The sequence is then 
repeated a finite number of times, after which the entire cake has been allocated. 
The second idea—and perhaps the more important of the two—was that of having 
a player trim a piece to a smaller size. 

Explicit mention of the lack of a constructive procedure for producing an 
envy-free allocation among more than two people dates back at least to Gamow 
and Stern [14]. The first breakthroughs on this problem occurred in the late 1950s 
and early 1960s, when the protocol solution to the envy-free problem for n = 3 
was found by John L. Selfridge, and rediscovered independently by John H. 
Conway. These solutions also involved trimming and an initial allocation of only 
part of the cake; they were widely disseminated by R. K. Guy and others, and 
eventually reported by Gardner [15], Woodall [32], Stromquist [27], and Austin [1]. 
The moving-knife solution of Stromquist [27] was found two decades later, as was a 
scheme due to Levmore and Cook [19], which can be recast as quite a different 
moving-knife solution to the envy-free problem when n = 3. Still other envy-free 
moving-knife schemes for three people [7] and, more recently, four people [8] have 
been discovered and are summarized in [6]. 

The extension of the Selfridge-Conway protocol to the case of even four people 
has remained an open, and much-commented upon, problem. See, for example, 
Gardner [15], Rebman [24], Stromquist [27], Woodall [31], [32], Bennett et al [3], 
Webb [29], Hill [16], [17], and Olivastro [22]. In what follows, we solve this problem 
by producing an envy-free protocol for arbitrary n. 

We have chosen a uniform presentation of four protocols that highlights the 
evolution of two important ideas—namely, trimming, and the use of sequences of 
partial allocations. Historically, these four protocols arose over a period of 50 years 
and nicely illustrate how ideas in mathematics are built, one upon another. The 
protocols we present are: 


The proportional protocol for n = 3 (Steinhaus). 

The proportional protocol for arbitrary n (Banach-Knaster). 
The envy-free protocol for n = 3 (Selfridge, Conway). 

The envy-free protocol for arbitrary n. 


PwWNP 
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Before turning to the protocols themselves, we must acknowledge the help of 
several people. Our interest in fair division was sparked by Olivastro [22]. Valuable 
mathematical contributions were made by William Zwicker and Fred Galvin. 
Indeed, the present version of our envy-free protocol owes much to the reworking 
of an earlier version by Galvin. 

Specific observations and comments by David Gale, Sergiu Hart, Theodore Hill, 
Walter Stromquist, William Webb, and Douglas Woodall also proved helpful. In 
addition, we have benefited from conversations and correspondence with the 
following people: Ethan Akin, Julius Barbanel, John Conway, Morton Davis, Karl 
Dunz, Shimon Even, A. M. Fink, Peter Fishburn, Martin Gardner, Richard Guy, 
D. Marc Kilgour, Peter Landweber, Jerzy Legut, Hervé Moulin, Dominic 
Olivastro, Barry O’Neill, Philip Reynolds, William Thomson, Hal Varian, Charles 
Wilson, and H. Peyton Young. 

The first protocol we present is a generalization of cut-and-choose to a propor- 
tional protocol for three people. This is the one found by Steinhaus during World 
War II. 


The Proportional Protocol for n = 3 
(Steinhaus, circa 1943) 


Step 1. Player 1 cuts the cake into 3 pieces (that he considers to be the same 
size). 

Step 2. Player 2 is given the choice of either passing, i.e., doing nothing (which 
he does if he thinks 2 or more of the pieces are of size at least 1/3), or 
not passing and labeling 2 of the pieces (that he thinks are of size 
strictly less than 1/3) as “bad.” 

Step 3. If Player 2 passed in step 2, then Players 3, 2, and 1, in that order, 
choose a piece (that they consider to be of size at least 1/3). 

Aside. In this case, each player receives a piece of size at least 1/3 in his own 
measure. This is true of: Player 3, because he chooses first; Player 2, 
because he thinks either 2 or 3 pieces are that large, and so at least 
one of them will still be available after Player 3 chooses his piece; and 
Player 1, because he made all 3 pieces of size 1/3. 

Step 4. If Player 2 did not pass at Step 2, then Player 3 is given the same two 
options that Player 2 had at Step 2. He ignores Player 2’s labels. 

Step 5. If Player 3 passed in Step 4, then Players 2, 3, and 1, in that order, 
choose a piece (that they consider to be of size at least 1/3). 

Aside. In this case, as before, each player receives a piece of size at least 1/3 
in his own measure. 

Step 6. If Player 3 did not pass at Step 4, then Player 1 is required to take a 
piece that both Player 2 and Player 3 labelled as “bad.” 

Aside. Note first that there certainly must be such a piece. At this point, 
Player 1 has received a piece that he thinks is of size exactly 1/3, 
which both Player 1 and Player 2 think is “bad,” 1.e., of size strictly 
less than 1/3. 

Step 7. The other two pieces are reassembled, and Player 2 cuts the resulting 
piece into two pieces (that he considers to be the same size). 

Step 8. Player 3 chooses one of the two pieces (that he considers to be at least 
tied for largest). 

Step 9. Player 2 is given the remaining piece. 

Aside. This is just cut-and-choose between Players 2 and 3, which ends the 
protocol. 
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The second protocol we present followed quickly on the heels of the first. It is 
the Banach-Knaster protocol, offered in response to Steinhaus’ question of whether 
his result could be extended from 3 to n people. Note here the introduction of the 
idea of trimming, which will be further exploited in both of the upcoming envy-free 


protocols. 


Step 1. 
Step 2. 


Step 3. 


Aside. 


Step 4. 


Aside. 


Step 5. 


Aside. 


Step 6. 


Aside 


Proportional Protocol for Arbitrary n 
(Banach-Knaster, circa 1944) 


Player 1 cuts a piece P, (of size 1/n) from the cake. 

Player 2 is given the choice of either passing (which he does if he 
thinks P, is of size less than 1/n), or trimming a piece from P, to 
create a smaller piece (that he thinks is of size exactly 1/n). The piece 
P,, now perhaps trimmed, is renamed P,. The trimmings are set aside. 
For 3 <i <n, Player i takes the piece P,_, and proceeds exactly as 
Player 2 did in Step 2, with the resulting piece now called P.. 

For 1 <i <n, Player i thinks that P, is of size less than or equal to 
1/n. We also have that P, > --- DP. Thus, every player thinks P,, 
is of size at most 1/n. 

The last player to trim the piece, or Player 1 if no one trimmed it, is 
given P,. 
The player receiving P,, thinks it is of size exactly 1 /n. 

The trimmings are reassembled, and Steps 1—4 are repeated for the 
remainder of the cake, and with the remaining n — 1 players in place 
of the original n players. 

The player who gets a piece at this second stage is getting exactly 
1/(n — 1) of the remainder of the cake; he, and everyone else, thinks 
this remainder is of size at least (n — 1)/n. Hence, he thinks his piece 
is of size at least 1 /n. 

Step 5 is iterated until there are only 2 players left. The last 2 players 
use cut-and-choose. 

As before, each player receives a piece that he thinks is of size at least 


1 /n. 


This ends the protocol. 


The next protocol we present is the Selfridge-Conway envy-free protocol for the 
case n = 3. (There are slight differences in the presentations of Selfridge and 
Conway; we follow the latter.) This protocol involves an elegant combination of the 
trimming idea introduced by Banach-Knaster and the basic framework that Stein- 
haus used. It also introduces the important notion of one player’s having an 
“irrevocable advantage” over another player, following a partial allocation. 


Step 1. 


Step 2. 


1995] 


Envy-Free Protocol for n = 3 
(Selfridge, Conway, circa 1960) 


Player 1 cuts the cake into 3 pieces (that he considers to be the same 
size). 

Player 2 is given the choice of either passing (which he does if he 
thinks two or more pieces are tied for largest), or trimming a piece 
from (the largest) one of the three pieces (to create a tie for largest). If 
Player 2 trimmed a piece, then the trimmings are named L, for 
“leftover,” and set aside. 
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Step 3. Players 3, 2, and 1, in that order, choose a piece (that they consider to 
be at least tied for largest) from among the 3 pieces, one of which may 
have been trimmed in Step 2. If Player 2 did not pass in Step 2, then 
he is required to choose the piece he trimmed if Player 3 did not. 

Aside. Notice that only part of the cake has been allocated. This yields a 
partition {X,, X,, X3;, L} of the cake such that {X,, X,, X,} is an 
envy-free partial allocation. The lack of envy is true of: Player 3, 
because he chooses first; Player 2, because he made at least two pieces 
tied for largest, and so at least one of them will still be available after 
Player 3 chooses his piece; and Player 1, because he made all three 
pieces of size 1/3, and the trimmed one has definitely been taken by 
either Player 3 or Player 2. 

Step 4. If Player 2 passed at Step 2, we are done. Otherwise, either Player 2 or 
Player 3 received the trimmed piece, and the other received an 
untrimmed piece. Whichever player received the untrimmed piece 
now divides L into 3 pieces (that he considers to be the same size). 
Call this player the “cutter” and the other the “non-cutter.” 

Aside. We will refer to Player 1 as having an irrevocable advantage over the 
non-cutter. The point is that, since the non-cutter received the trimmed 
piece, Player 1 will not envy the non-cutter, regardless of how L is 
later divided among the three. 

Step 5. The three pieces into which L is divided are now chosen by the 
players in the order: non-cutter first; Player 1 second; cutter third. 
(Each chooses a piece at least tied for largest among those available to 
him when it is his turn to choose.) 

Aside. At this point, the entire cake has been allocated. Since the non-cutter 
chooses his piece of L first, he experiences no envy. Player 1 does not 
envy the non-cutter, since he had an irrevocable advantage over him, 
and Player 1 does not envy the cutter, because he is choosing his piece 
of L before the cutter does. Finally, the cutter experiences no envy 
since he divided L into three equal pieces. 

This ends the protocol. 


The final protocol we present is our envy-free protocol for an arbitrary number 
of players. This result was announced in [9, 12, 13, 23]. A brief discussion of some 
important differences between this protocol and the three earlier ones, and a 
couple of important open questions, follow. 

The central feature of our envy-free protocol, like that for the n = 3 protocol, is 
that players trim pieces of the cake to create ties, rendering them indifferent 
among these pieces. When n > 3, however, one needs to start the trimming and 
choosing process—leading to an envy-free partial allocation—with more pieces 
than there are players. 

As an informal illustration of how to achieve an envy-free partial allocation, 
suppose there are four people. Have Player 1 cut the cake into 5 equal pieces. 
Player 2 then trims 2 pieces, creating a 3-way tie for largest. Player 3 then trims 1 
piece, creating a 2-way tie for largest. The players now choose in the order: Player 
4, Player 3, Player 2, Player 1, with the middle two players required to take a piece 
they trimmed if one is available. Clearly, each player thinks his piece is at least tied 
for largest. The burden of our demonstration of the n-person envy-free protocol is 
to show that a full allocation of the entire cake can be accomplished in a finite 
number of steps. 
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For simplicity, we will present only the n = 4 version of the envy-free protocol. 
The extension to arbitrary n is fairly straightforward and left to the reader. In 
outline form, the protocol goes as follows: 

One player (chosen here to be Player 2 for later notational simplicity), cuts the 
cake into 4 equal pieces, hands these out, and asks if anyone objects. If, say, Player 
1 objects, then Players 1 and 2 (alone) go through several steps which yield six sets 
(the Ys and Zs in Step 7 below) to be used as a Starting partition (in place of the 
five equal pieces) for the kind of trim-and-choose sequence among all four players 
that we illustrated two paragraphs earlier. This trim-and-choose sequence is 
repeated again and again until we arrive at a partial allocation in which Player 1 
has an irrevocable advantage over Player 2 (the ‘‘aside” after Step 15 below). From 
this point on, we never have to worry about Player 1’s objecting because of envy for 
Player 2. Repeating this at most once for each pair of players results in an 
envy-free allocation of the entire cake after finitely many steps. 


Envy-Free Protocol for Arbitrary 7 
(the n = 4 version) 
(1992) 


Step 1. Player 2 cuts the cake into 4 pieces (that he considers to be the same 
size), keeps one piece, and hands one piece to each player. 

Step 2. Each of the other three players is asked whether or not he objects to 
this allocation. (A player objects iff he envies some other player.) 

Step 3. If no one objects, then each keeps the piece he was given in Step 1, 
and we are done. 

Step 4. Otherwise, we choose the smallest i so that Player i objected. For 
notational simplicity, assume i = 1. Player 1 now chooses a piece 
originally given to some other player (whom he envied) and calls that 
piece A. The piece originally given to Player 1 is called B. 

Aside. Once we have A and B, the other two pieces in the allocation from 
Step 1 are reassembled. That part of the cake will be allocated later. 
Note that Player 1 thinks A is larger than B. Player 2 thinks A and 
B are the same size. 

Step 5. Player 1 now names a positive integer r > 10 (chosen so that, for any 
partition of A into r sets, Player 1 will prefer A, even with the 7 
smallest—according to Player 1—pieces in the partition of A re- 
moved, to B). 

Aside. Player 1 can easily choose such an r. That is, the union of the 7 
smallest pieces is certainly no larger than 7 times the average size of 
all r pieces. Hence, Player 1 simply chooses r large enough so that 
Tu(A)/r < wCA) — w(B), where pw is his measure. 

Step 6. Player 2 now partitions A into exactly r sets (that he considers to be 
the same size), and does the same to B. 

Step 7. Player 1 chooses (the smallest) 3 sets from the partition of B and 
names these Z,, Z,, Z,. He also chooses either (the largest) 3 sets 
from the partition of A Gf he thinks these are all strictly larger than 
all the Zs), and trims at most 2 of these (to the size of the smallest 
among the three), or he partitions (the largest) one of the sets in the 
partition of A into 3 pieces (that he considers to be the same size). In 
either case, he names these Yj, Y,, Y3. 

Aside. Player 1’s strategy in Step 7 guarantees that he will think all three Ys 
are the same size, and each strictly larger than all three Zs. This is 
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true even if he chooses the second option.” Player 2 thinks all three 
Zs are the same size, and each is at least as large as all three Ys. 

Step 8. Player 3 takes the collection of 6 pieces, and either passes (if he 
thinks there already is at least a 2-way tie for largest), or trims (the 
largest) one of these (to the size of the next largest), thus creating at 
least a 2-way tie for largest). 

Step 9. Players 4, 3, 2, and 1, in that order, choose a piece from among the 6 
Ys and Zs as modified in Step 8 (that they consider to be largest or 
tied for largest), with Player 3 required to take the piece he trimmed 
if it is available. Player 2 must choose Z,, Z,, or Z;. Player 1 must 
choose Y,, Y;, or Y3. 

Aside. This yields a partition {X,, X,, X3, X,, L,} of the cake such that 
{X,, X5, X3, X,} is an envy-free partial allocation, and L, is the 
leftover piece. Moreover, Player 1 thinks his piece X, is strictly larger 
—say by e—than Player 2’s piece X,. 

Step 10. Player 1 names a positive integer s (chosen so that [4u,(L,)/5)° < «, 
where pz, is Player 1’s measure). 

Aside. The integer s specifies how many times the players will iterate the 
basic trim-and-choose sequence to follow. Notice that if the rules 
were instead to allow the iterations to continue until Player 1 said 
“stop” (which he could strategically do at the point at which he 
thinks the leftover crumb is smaller than the advantage he has over 
Player 2), then there is no guarantee that a strategically misguided 
Player 1 would not keep the game going forever. 

Step 11. Player 1 cuts L, into 5 pieces (that he considers to be the same size). 

Step 12. Player 2 takes the collection of 5 pieces, selects (the largest) 3 pieces, 
and trims (the largest) 2 or fewer of these (to the size of the smallest, 
thereby creating at least a 3-way tie for largest). 

Step 13. Player 3 takes the collection of 5 pieces, perhaps trimmed in step 12, 
selects (the largest) two, and trims, if he wants to, (the largest) one of 
these (to the size of the smallest, thus creating at least a 2-way tie for 
largest). 

Step 14. Players 4, 3, 2, and 1, in that order, choose a piece (that they consider 


*The proof runs as follows: We are assuming that both A and B have been partitioned into r 
pieces, and that B is not only smaller than A but smaller even than A with the smallest 7 pieces of A’s 
partition removed. Arrange the sets in both partitions from largest to smallest as A,, A>,..., A, and 
B,, By,..., B,. Let yw denote Player 1’s measure, and suppose, for contradiction, that both of the 
following hold: 

1. u(B,_») = w(A3), which holds if A,, A,, and A; are not all strictly larger than B,_,, B 
and B,. 

2. u(B,_>) = w(A,)/3, which holds if A, cannot be partitioned into 3 sets all larger than B,_,, 
B,_,, and B,. 

It follows from 1 that: 

3. u(B,U ++: UB,_3) > w(A,U ++: UA,_3), since there are r — 9 sets in each union, and the 
smallest one of the Bs is at least as large as the largest one of the As. 

It follows from 2 that: 

4. u((B, UB, U Bz) ULB, U Bs U Be) = uA, UA), 
since each of the blocks of 3 Bs is larger than each of the As. 

But 3 and 4 clearly demonstrate that: 

5. w(B) > w(A, U +++ UA,_3). 

This is the desired contradiction since the set on the right is A with the smallest 7 pieces of its partition 
removed. 


r-—1b 
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to be largest or tied for largest), with Players 3 and 2 required to take 
a piece they trimmed if one is available. 

Step 15. Steps 11-14 are repeated s — 1 more times, with each application of 
these four steps applied to the leftover piece from the preceding 
application. 

Aside. This yields a partition (Xj, X5, X4,’X,, L,} of the cake such that 
{X), X35, X3, X4} is an envy-free partial allocation, and such that 
Player 1 thinks that Xj; is larger than X5,UL,. We now declare 
that Player 1 has an irrevocable advantage over Player 2, and we 
begin creation of a subset of {1, 2,3, 4} x {1,2, 3,4}, which we call 
“ £S’ for “irrevocable advantage,” by putting (1,2) € 4°. 

Step 16. Player 2 cuts L, into 12 pieces (that he considers to be the same 
size). 

Step 17. Each of the other players declares himself to be of type A (if he 
agrees all the pieces are the same size), or type D (if he disagrees). 
Player 2 is declared to be of type A. 

Step 18. If Dx A C 4%, then we give the 12 pieces to the players in A, with 
each of them receiving the same number of pieces. In this case, we 
are done. 

Step 19. Otherwise, we choose the lexicographically least pair (i,j) from 
D XA that is not in 42%, and we return to Step 4 with Player 7 in 
the role of Player 1, Player j in the role of Player 2, and L, in place 
of the cake. 

Step 20. Steps 5-18 are repeated. 

Aside. Each time we pass through Step 15, we add an ordered pair to AY. 
Notice that since D XA c {1,2,3,4} x {1,2,3,4}, and 4a%c 
{1, 2, 3,4} x {1, 2,3, 4}, we must have D X A Cc ZY after at most 16 
iterations. At this point, we conclude at Step 18 with an envy-free 
division of the entire cake. 

This ends the protocol. 


There is an important way (pointed out to us by several people) that the 
envy-free protocol for even n = 4 differs from the envy-free protocol for n = 3: 
For n = 3, the number of cuts needed is at most 5, regardless of what the 
measures are. For n = 4, the number of cuts needed can be made arbitrarily large 
by a suitable choice of the four measures (although the moving-knife solution [8] 
for the four-person problem gives a bounded number of cuts). This raises: 


Question 1. Is there a bounded envy-free protocol for n = 4 or n > 4? 

There is another slightly more subtle (and perhaps related) way in which the 
envy-free protocol differs from the others: The three earlier ones also work in the 
context of what are called “CD preference relations” in [2]. (A CD preference 
relation is a complete, reflexive, transitive binary relation that satisfies a partition- 
ing postulate, a trimming postulate, and a weak additivity postulate.) The envy-free 
protocol, on the other hand, requires what is call an “Archimedian CD preference 
relation” in [2]. The main result in [2] is the fact that a CD preference relation is 
induced by a finitely additive measure in the obvious way iff it is Archimedian. 
This raises: 


Question 2. Is there an envy-free protocol for n = 4 or n > 4 that works in the 
context of non-Archimedian preference relations? 

It turns out that techniques similar to those used in the n-person envy-free 
protocol can also be used to solve the “chores” problem [15], wherein each player 
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wants to minimize the amount of cake he or she receives. This and related 
questions (e.g., the Pareto-optimality of allocations) are discussed in [6]. 
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The Mathematics Portfolio 


Mary L. Crowley and Ken Dunn 


Portfolios are for art students, right? Well... not exclusively. We are proposing the 
portfolio as an academic artifact for the mathematics major. This article describes 
our vision of what a mathematics portfolio could be, how to organize the portfolio, 
items which might be included in a portfolio, and portfolio evaluation. In develop- 
ing these ideas we have considered several portfolio audiences: the student, the 
departmental advisor, the department, and, to some extent, future associates such 
as graduate school departments and employers. 


THE PORTFOLIO CONTENTS. As with the artist’s portfolio, the mathematics 
portfolio should include a collection of the student’s best mathematical work. The 
variety and breadth of the items which are available depends, of course, on the 
mathematical “experiences” the student has encountered and on the reasons for 
assembling the portfolio. There is a place for the traditional products of a 
mathematics program—assignments and tests. When, however, a wide range of 
assessment activities are used, a richer profile of the student’s talents can be 
displayed. This can be achieved through ‘non-traditional’ assignments such as 
journal writings, book reviews, student presentations (captured on audio or video 
tape), group projects, computer based activities, and open-ended investigations. 

As we envision it, however, the portfolio should be more than a display case for 
outstanding work. We want the portfolio, for example, to chronicle each student’s 
mathematical career, rather like a faculty member’s curriculum vitae. Thus, we 
suggest that a list of the mathematics courses the student has taken be included. 
The portfolio also provides the opportunity to guarantee that students are exposed 
to activities which are valued, and which might not be included in any of their 
course requirements, e.g. writing about mathematics, critiquing books or articles 
which are mathematical in nature, or reflecting on their own mathematical 
expectations and progress. Each department will no doubt have special interests 
which they too could address through the portfolio. 

Figure 1 presents a sample suggestion list for a portfolio. Examples from a 
range of mathematical areas (e.g., item 7), spanning several semesters (e.g., item 
4), and reflecting a variety of mathematical experiences are included. Items 3 and 7 
are drawn from “traditional” assessment tasks. Items 1 and 2 are biographical in 
nature. Items 4 and 8 ask students to reflect on their mathematical expectations 
and success, while items 5 and 6 provide students the opportunity to analyze both 
mathematical ideas and presentations. Written responses, not just numerical 
solutions, are required in several items (1, 4,5, 6, 7,8). With item 9, the student is 
given the responsibility of selecting, without direction, something which reflects his 
or her mathematical prowess. Not all of these items would necessarily be included 
in a portfolio and, of course, other items might reflect your program more 
accurately. 
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A mathematical autobiography. 

An annotated list of courses taken. 

The student’s honours project (if there is one). 

A journal covering each semester of the junior and senior year. This could begin with a 
description of what the student hopes to get out of the semester, include mathematical high 
and low points of the semester, and end with a follow-up as to what was accomplished in the 
semester. 

A critique of one (or more) textbooks. 

Responses to, or reviews of, articles or books that the students are asked to read over the 
period of their program. 

The best test or assignment from each of the “‘core” courses that the students must take as 
part of their program. A rationale for each choice should be provided. 

A description of the mathematical insight which. most excited the students over their 
mathematical career and an explanation why. This might be a theorem, some connections 
relating distinct mathematical disciplines, the use of mathematics in a modelling process, a 
conjecture, etc. 

A mathematical item, of their choice. This is the student’s chance to put something in the 
portfolio which they have generated, of which they are proud, and which has not otherwise 
been requested. A rationale for why the item has been included should accompany the item. 
It might be a problem which the student struggled with before solving. It might be something 
the student ‘discovered’. It might be an instance where they applied their mathematical 
knowledge in another discipline. 


Figure 1. Examples of Portfolio Categories 


RATIONALE. The primary goal for asking students to assemble the portfolio is to 
encourage them to reflect on their mathematical flexibility and growth, and to help 
them focus on their mathematical interests. Selecting the items for inclusion 
requires students to review the work they have completed, to think about its 
mathematical value, and to observe how they have matured mathematically. This is 
particularly the case when the portfolio is accumulated over several semesters, or 
even several years. 

The portfolio is also a useful tool for the student’s advisor. From a record 
keeping point of view, for example, a roster of classes can assist in discussion about 
future course selection. (Indeed, for anyone interested in what mathematics 
courses have been completed, it is helpful to have all the courses listed together, 
rather than embedded in the sessional transcripts.) The remaining portfolio 
materials inform the advisor about the student’s concerns and serve as a departure 
point for discussions about mathematical interests in general, as well as more 
practical topics such as career choices. 

Once completed, the portfolio can be an item of value and of interest to several 
constituents. For the students, it is a tangible record of their progress and 
success...more than the transcripts and tattered notebooks we retain from our 
undergraduate careers. Since the portfolio provides direct and concrete evidence 
of a student’s abilities and way of thinking, graduate schools, and perhaps potential 
employers, might also be interested in it. Indeed, the well-crafted portfolio can 
demonstrate the student’s arsenal of problem solving tools, ability to organize and 
communicate mathematically, writing skills, etc. 

The Department will also find it of value to look at the variety and depth of the 
materials represented in each year’s portfolios. If a collection of portfolios is 
reviewed, it can be a constructive way of informing faculty about the departmental 


20 THE MATHEMATICS PORTFOLIO [January 


expectations of students, what material is being covered, and what assessment 
techniques are being used. We are not implying that portfolios be used to “check”’ 
on the teaching or assessment techniques of faculty members. We are suggesting, 
however, that, in aggregate, portfolios may give a glimpse of what is valued, and 
valuable, in the major program. For many of these same reasons, potential 
students might also find the portfolios of interest. 


ORGANIZING TO USE THE PORTFOLIO. We feel that it is essential that the 
Department, at least those teaching courses from which portfolio materials might 
be drawn, participate in deciding the categories for inclusion in the portfolio and 
the time period over which items will be collected. Minimally, this collaboration 
informs all faculty of what is expected of majors. More importantly, however, it 
also gives each faculty member “ownership” in, and an understanding of, the 
project. At the same time, the discussions which accompany such joint decision 
making should also contribute to reviewing the major program as a whole, and, in 
particular, lead to discussions about goals and assessment techniques. 

In our institution, every mathematics major draws up a plan of study with an 
advisor. At that time, or if you have a forum for meeting all your majors together, 
the portfolio can be introduced and explained. Topics such as the rational for the 
portfolio, the items for inclusion, the dates for completion, and the presentation 
format should be discussed. The introductory process is completed with the 
student and the “department” signing an agreement which specifies deadlines and 
lists the categories from which the portfolio items should be chosen. 

Finally, the student is, of course, responsible for assembling the portfolio. The 
advisor, however, oversees this, checking each semester that appropriate, even if 
only initial or tentative, contributions have been ‘‘deposited’’. By the student’s last 
term, all the material for the portfolio should be selected and assembled. A table 
of contents should then be added as it will greatly facilitate the use (and 
evaluation) of the portfolio. 


EVALUATION. Rather than assign a mark to the portfolio, we suggest that the 
completed portfolio be a requirement for a recommendation for graduation. If 
students see the value of the activity, and the importance put in it by the 
Department, they will want to do a good job. When, however, the assigning of a 
grade is deemed necessary, perhaps evaluation themes could be identified, e.g. the 
diversity of problem solving strategies reflected in the work. Students would be 
informed about the grading criteria at the time the portfolio was introduced and 
this would be included in the agreement between the department and the student. 
The student could then select items for inclusion accordingly. 

As mentioned earlier, a collection of portfolios can serve an important role in 
program evaluation. Information gleaned from studying a sampling of portfolios 
can inform a department about the range of evaluation techniques being used, 
about the level of success students are achieving, about the nature of the students’ 
experiences, about the level of expectations of faculty and students, and about the 
attainment of educational goals. Information obtained in this way can provide a 
unique and previously untapped perspective. 


CONCLUSION. A well considered mathematics portfolio provides faculty and 
students with information about themselves and the program. In deciding what 
should be included, departments and instructors are forced to articulate what they 
value and what their goals are for their majors. Analysing the portfolios provides 
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information about how successfully these goals are being met. On a more individ- 
ual level, departmental advisors can use the portfolio to inform themselves about 
student progress and interest. And for the student, generating the portfolio is an 
exercise in self-evaluation, which results in a concrete overview (i.e. the portfolio) 
of his or her undergraduate mathematical experience. 

We see lots of potential for the portfolio. A modified version, for example, can 
easily be implemented in a one semester course. Or, portfolios can be used across 
disciplines. Indeed, we have become so intrigued with the idea that we are 
proposing its use in a new first year program, the Science Foundation Year, which 
is being introduced on our campus. The goals of this program are to promote the 
study of the sciences from an integrated perspective, to address timely and lively 
issues in the field, and to encourage students to reflect on their own thought 
processes. An end-of-the-year portfolio, drawing from all the courses the students 
take, and “extras’’, will be required of those enrolled in the program. The items to 
be included will support the goals of the program. By its very nature, the portfolio 
provides a framework for coordinating activities and a forum for communication. 
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Derivative Polynomials For Tangent 
and Secant 


Michael E. Hoffman 


1. INTRODUCTION. Sometimes problems naturally occur in pairs, and it’s best 
to tackle both at the same time. For instance, consider the problem of finding the 
nth derivative of tan x. It’s not hard 'to see that there are polynomials P, of degree 
n+ 1 for n = 0,1,... so that 
n 

a tan x = P (tan x). 
This problem has a natural companion: find the nth derivative of sec x. Here there 
are polynomials Q,, of degree n so that 

n 

dx” 

The P, and Q, are different sequences of polynomials, but they are evidently 
related. The numbers P,(0)/n! and Q,(0)/n! are the coefficients of x” in the 
Maclaurin series for tan x and sec x respectively, and their computation is a 
classical problem. 


Here’s another problem: for positive integer nm and 0 <a <1, what is the 
improper integral 


sec x = Q,(tan x)sec x. 


It has a natural companion problem where the denominator is replaced by e* + 1. 
As we Shall see, this turns out to be essentially the same pair of problems as 
considered in the previous paragraph. 

This paper has two main parts. First, in §2 and §3 we obtain the polynomials P, 
and Q,, as instances of ‘derivative polynomials’ associated with functions f such 
that f(x) is a polynomial function of f(x). Then in §4 we apply this theory to the 
computation of improper integrals and infinite series, followed by concluding 
remayks in §5. 


2. DERIVATIVE POLYNOMIALS. Suppose f is a function whose derivative is a 
polynomial in f, i.e. f(x) = P(f(x)) for some polynomial function P. Then all the 
higher derivatives of f are also polynomials in f, so we have a sequence of 
polynomials P, defined by 


f(x) =P, f(x), 0 20. 
In fact, the polynomials P, are determined by the conditions 
P,.(u) =u, P.4(u) = Pi(u)P(u) forn > 1. (1) 
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If we form the generating function 


fe.) n 


F(u,t)= 5 —P,(u), 


n=0 
then the equations (1) are equivalent to 
F(u,0) =u, F,(u,t) = P(u)F,(u,t). (2) 


Also, f has a ‘companion’ function g = exp( {f(x) dx) satisfying g'(x) = f(x) g(x), 
and there are polynomials Q, so that 


g(x) =O,(f(x))g(x), 020. 
It is easy to see that the Q, are determined by the conditions 
Q(u)=1, Q,4,(4) =uQ,(u) + Q,(u)P(u) forn = 0. (3) 


If we define a second generating function G(u, t) by replacing P, in the definition 
of F(u, t) with Q,, then (3) is equivalent to 


G(u,0) = 1, G,(u,t) = P(u)G,(u,t) + uG(u,t). (4) 


Our first result gives explicit formulas for the generating functions F and G. 


Theorem 2.1. For functions f and g as above, the corresponding generating functions 
for their derivative polynomials are given by 


g(f ‘(u) +t) 
g(f-(u)) — 


Proof: Let u = f(x). If F(u, t) = f(x + t) we have F(u,0) = f(x) = u, and 


F(u,t) =f(f-'(u) +t) and G(u,t) = 


du ax 
P(u)F,(u,t) = —f'(2 +t) = f(x +1) = F(u,t), 


so F(u,t) = f(x + t) satisfies (2) above. This establishes the first formula; the 
proof of the second formula (using (4)) is similar. 

In view of (2) and (4) above, the generating functions F and G are determined 
by P alone. Indeed y = f(x) is a solution of the differential equation y'(x) = PCy), 
so P determines f up to a constant (i.e., f(x) can be replaced by f(x + c)). Here 
are two examples. 


Example 1. Let f(x) = kx, so that P(u) = k. The generating function F(u, t) is 
just u + kt. The companion of f is g(x) = e** /*, and from the theorem G(u, t) 
is exp(tu + kt*/2). Expand this out to get 


Q,(u) = 7 Fale — 1)(2i — 3) +++ 1kéu"-, 


In the case k = —1, the Q, are (one variant of) the Hermite polynomials. 


Example 2. The main example of this paper, of course, is the case f(x) = tan x, 
g(x) = sec x (i.e. P(u) = u* + 1). Here the generating functions are 
u+ tant sin t + u cost 


F(u,t) = -—--—-—- = — 
( ) 1 —utant cost —usint 
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and (cf. [10]) 


G cos(tan~' u) 1 
u,t) = OO _ FSEarrrr 
(4st) cos(tan~' u)cost — sin(tan7'u)sint cost —usint 


Until now the P, and Q, have been treated in parallel, but separately. The next 
result brings them together. 


Theorem 2.2. The generating functions F and G satisfy (and are determined by) the 
conditions 


F(u,0) =u, G(u,0) = 1, F,=P(F), and G, = FG. 


Proof: Using 2.1, we have 
Fi(u,t) =f'(f-'(u) +t) = P(f(f (4) + t)) = PCF), 


and the second equation is similar. 
This result is useful for obtaining recurrences. For instance, in Example 1 above 
the equation G, = FG leads to the recurrence 
On+i(u) = Po(u)Q,(u) + nP,(u)Q,_\(u) = uQ,(u) + nkQ,,_ (u) 
sinceP,(u) = 0 for k => 2. Compare this with (3) to get Q)(u) =nQ,_ (wu). 
It follows that y(x) = Q,(x) is a solution of the differential equation ky” + xy’ — 
ny = 0. (Of course, this is Hermite’s equation if k = —1.) 


In Example 2, Theorem 2.2 leads to the pair of recurrences 
n 


Pras(u) = ¥(")P(u)P,(ul) + 8, and 
i=0 (5) 
Qnui(u) = ("}P(w)Q,-(u). 


i=o 


3. COMPUTING PARTICULAR VALUES. Henceforth we specialize to the case 
P(u) = u? + 1 (Note this applies equally to f(x) = tan x, g(x) = sec x and to 
f(x) = —cot x, g(x) = csc x). In this section we show how to find particular 
values of P, and Q, without computing the polynomials themselves. 

Of all the values of the polynomials P, and Q,, those at zero are of the most 
interest: as noted in the introduction, they give the coefficients of the Maclaurin 
series for tangent and secant. These numbers can be computed as follows. From 
(1) and (3) it is evident that P, is an odd function if n is even, and Q, is odd for n 
odd. Then using (5): 


P(0)= 1, Ps(0) = (7 } (0)? = 2 


Px(0) = {1} Px(0)P3(0) + ($} Ps(0)P,(0) = 165 
and thus 


20(0)=1, 240) = (4 JO0(0)P,(0) = 1, 
24(0) = (3 }00(0)P3(0) + (3 }o,(0) PO) = 5, 


QO,(9) = [5 }20(0) P5(0) + [5 }22(0) P5(0) + [3 }os(0) P.(0) = 61. 


Other values can be obtained from these via the following functional equation. 
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Theorem 3.1. If u # 0, then 


P,(u) = 2" 


p u? — 1 u>+1 u*—1 

+ ——— |}. 
"\ Qu 2u 2, 2u 
Proof: It is enough to show 


u* —1 uz+1 [/[ur*-1 
F(u,t)=F ,2t| + G ,2t 
2u 


2u 2u 


for u #0. Let u = —cotx and apply 2.1 to get F(u,t) = —cot(x +t) and 
csc xG(u, t) = csc(x + t). From the half-angle formula for tangent, 


F(u,t) = —cot(2x + 2t) — csc(2x + 2t) 
= F(—cot2x,2t) — csc2xG(—cot2x,2t), 


from which the result follows. 
Putting u = 1 in Theorem 3.1, we have 


2"Q,(0), mn even, 
P(1) = 2"( P,(0) + Q,,(0)) _ one (0), n odd. 


To get the Q,(1) we need the following recurrence, which complements (5). 


Theorem 3.2. For n > 0, 
(uw? +1) ¥ ("Ou Qy au) = Pru) 
i=0 
Proof: It suffices to prove the identity (u* + 1)G(u, t)* = F(u, t), whose right-hand 
side is P(F) = F(u, t)* + 1 by Theorem 2.2. But by 2.1, 
F(u,t)* + 1 = tan?(tan-'(u) + t) + 1 = sec?(tan-!(u) + ft) 


= sec?(tan~!(u))G(u, t)’. 


Given the P,(1), the Q,(1) can be obtained recursively by setting u = 1 in 
Theorem 3.2. At this point the reader may find it instructive to compute Q,,(1). 


Remark. There are more efficient ways to compute the numbers P,(0) and Q,,(0) 
than that outlined here: see [3] and [9]. For the Q,(1) see [8]. 


4. IMPROPER INTEGRALS AND SERIES. We now use the polynomials P,, and 
Q,, to express some improper integrals and infinite series, starting with the former. 


Theorem 4.1. Let 0 <a < 1. Then.for integer n = 0, 


J 


_~e~ +1 


co x"et 


dx = 1"*'csc amQ,( —cot a7r). 


Proof: The only pole of the meromorphic function e** /(e* + 1) inside the rectan- 
gle with vertices —R, R, R + 27i, and —R + 277i is at vi, where it has residue 
—e™'@ Integrate it around this rectangle and take limits as R — © to obtain the 
result for n = 0. Then differentiate n times with respect to a, noting that the nth 
derivative of csc x is csc xQ,(—cot x). 
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If the denominator of the integrand is replaced by e* — 1, we have to modify 
the numerator in the case n = 0 to get a convergent integral. 


Theorem 4.2. Let 0 <a < 1. Then 


0 x"e 
(a) J os _ dx = 1"*'P(-cot am) for integern = 1, and 


o ett — el —a)x 
(b) f p= —27 cot a7r. 


Proof: To prove (a), consider f defined by 


az 


f(z)= 240; f(0)=1. 


Then f is analytic inside the rectangle with vertices —R, R, R + cri, and —R — wi, 
SO we can apply Cauchy’s theorem and let R — to get 


0 xe? _p%e (x + qi)je™ 
| dx — eam + dx 


_»~e* —1 — oo e~+1 


Now use 4.1 to compute the right-hand side, and (a) follows for n = 1. Differentia- 
tion with respect to a then gives the general case. For (b), note that the integrand 
has no poles within the rectangle used to prove (a): integrate it around this 
contour, take limits as R — ©, and simplify using 4.1. 

Next we pass from integrals to series. 


Theorem 4.3. For real 0 <a < 1 and integer n > 0, 


00 1 yy"! nth 
(a) 2» (k+a)"™" + sean = P,(cot a7r) 
and 
(b) 
= (-1)" ~  (-1)' nt} 


———— +(-l —————————| = — cscarQ, cot amr). 
2» (k+a)"** Kno (k+1-a)"*' n! ( ) 


Proof: For n = 1, let 


co x"et* 


1,(a) = [ 


9 e~-—1 
Since 
0 x"e% o(—x 7 pax 
i. — ho 7 i a 
and P(—x) = (-1)"*'P(x), 4.2(a) can be restated as 
I,(1 — a) + (-1)""'1,(a) = 7"*'P,(cot a7). 
On the other hand, 


dx = (-1)"*11,(1 — a) 


wo x %ela— Dx 


IT = —_—__— fx = en —(k+1-a)x _ ~ 
(a) i ss x jee a=¥ 


n! 


k=0 (k +1- ay"! ; 
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and so (a) follows for n > 1. For n = 0, note that 


xo (i- a 1 1 
f e* —e 


ax = 2f° “25 — 


«0 er k+1l-a k+a 


_ el- a)x 


and use 4.2(b). The proof of (b) using 4.1 and the series expansion 


is entirely analogous (but simpler, since no special argument is required for n = 0). 


Remark. Equation 4.3(a), for rational a, appears implicitly in §171 of Euler’s 
algebra text [6]. In two earlier papers [4], [5] he relates the left-hand sides of 4.3(a) 
and 4.3(b) (for rational a) to improper integrals of certain rational functions. 

From Theorem 4.3 we can derive a quite useful result on series of powers of 
reciprocals of integers. We shall need the following definitions. A function w: 
Z— C is periodic modq if #(q) = 0 and &(n + gq) = W(n) for all n € Z, and 
alternating modq if W#(q) =0-and W(n + gq) = —d(n) for all n € Z. If Ww is a 
periodic or alternating function mod q, then it is even if w(q — j) = w(J) for all 
1 <j <q, and odd if #(q —j) = —w(j) for all 1 <j <q. 


Theorem 4.4, If w& is a periodic function mod q, then 


(a) ET say L w(0)P, cot 2 


ja1 J 2q 


where n is even and w is odd, orn is odd and w& even. If & is an alternating function 
mod q, then 


00 . n+1 q-1 
0) DP sag Zw ayae 0, (cot = 


j=l J 2q 


when n and w are either both even or both odd. 


Proof: As the proofs of (a) and (b) are similar, we give only the former. Set 
a = p/q in 4.3(a) and multiply both sides by #(p)/q”*'; then sum over p to get 


Fe | (=1)""w(p) fot {cot 7). 
Tak apy"! (aha qgop*tt| P.| cot — 
paika0| (qk +p)"*' (qk+q-—p)"*' | @"*in! 2X vp) co 


If n and & have opposite parity, (—1)"*!W(p) = W(q — p) and the left-hand side 
iS 


q-1 » w(p) q7l w(q—p) - WJ) 
ee 1 =2 ao: 


p=1k=0 (qk +q-—p) jai J 
Here are some examples: for the first two, it’s helpful to recall §3. 


Example I. Since the periodic function mod 2 with #(1) = 1 is even, we have 


1 1 1 att 


1+ antl +r 57tl +r qt re ~ gnt2ny P,(0), 
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for n odd, from which it follows that 


1 1 1 a” *1P (0) 


1+ fee = 6 
2(2"*! — 1)n! (9) 


gntl +r anti + qnti 


for such n. Similarly, using the alternating function mod 2 with #(1) = 1, we obtain 


1 1 1 antl 


l— Sari + Gavi ~ Gavi + = Gara nl) (7) 


for even n. 


Example 2. By use of alternating functions mod 4, we have 


1 1 1 1 1 v2 7"*! 
Batt ~ Saat ~ Gavi + owed + pest 7 = er Pn) 


if m is even, and 


1 1 0 1 1 V20"*} 
1 Sarr 7 Geet + ear + ower 7 fpest 7 TF Gar Pn) 


if n is odd. Cf. [7], [8]. 


Example 3. Dirichlet L-series are defined as follows. Let y be a homomorphism 
from the units of Z/kZ to the nonzero complex numbers. Extend y to a function 
on Z (called a Dirichlet character mod k) by defining y(n) to be y(kKZ + n) if 
kZ +n isa unit of Z/kZ (ie. (n, k) = 1), and 0 otherwise (so y is periodic mod k 
as defined above). For complex s, the Dirichlet L-series corresponding to y is 


~-1 


5 
p prime D 


Loy = FM. , - x(P) 


If y is a Dirichlet character mod k, then either x(k — 1) = 1 or x(k — 1) = —-1: 
xy is called even in the first case and odd in the second (which agrees with our 
previous terminology). By Theorem 4.4, for y a Dirichlet character mod k and 
n > 1 an integer, 


n 


L 7 
(1X) = Soa — 4)! 


k-1 . 
J 
x c(i) Py 60 a 
j=l k 


if y and n are both even or both odd. Cf. [1], [11]. 


5. CONCLUSION. We have emphasized throughout the symmetry between the P, 
and the Q,. Mostly they are developed in parallel, but in the recurrences and 
functional equation of §3 they intertwine in an essential way. 

It is more usual to state the Maclaurin series for tangent and secant, and the 
closed form for the series in (6) and (7), in terms of Bernoulli and Euler numbers 
(see e.g. [2]). The knowledgeable reader may be wondering how the P, and Q, are 
related to the Bernoulli and Euler polynomials. It turns out that the rational values 
of the Bernoulli and Euler polynomials can be expressed in terms of the P, and 
Q,,, but the relation is not a simple one. For the questions considered here, P, and 
Q,, seem more natural. 
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Wandering Thoughts 


A century later, Jonathan Edwards would be 
keeping strict accounts of his spiritual life in a 
journal he begins... when he is ninetcen. The 
journal seems to have grown out of a scrics of 
resolutions Edwards made, and against which he 
would sometimes dovetail the record of his actual 
conduct. ... He can berate himself for fecling “dull, 
dry, and dead” on a given day, and although he is 
aware of the dangers to one’s health from excessive 
self-mortification, he has enough resoluteness to 
commit himself to an occasional cold shower of 
mathematics. “When I am violently besct with 
temptations, or cannot rid myself of evil thoughts,” 
he resolves, on July 27, 1723, ‘‘to do some Arith- 
metic, or Geometry, or some other study, which 
necessarily engages all my thoughts, and unavoidably 
keeps them from wandering.” 


From A Book of One’s Own by Thomas Mallon, 
pp. 106-107, Ticknor & Fields, New York, 1984 
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The Law of Large Numbers and V2 


Thomas M. Liggett and Peter Petersen 


Sukru Yuksel, a graduate student in architecture at UCLA, wondered how 
medieval architects approximated 2. This would be important, for example, if 
they wished to design two square rooms in such a way that the area of the larger 
room was double the area of the smaller. Issues of status might dictate relations of 
this sort. In considering various possibilities, he discovered the following approach 
to this approximation problem. Start with the sequence a, = (j2)" = 
(1, V2, 2,2V2,4,4Vv2,...). All the numbers appearing in this sequence are “known” 
(i.e., are integers) except V2 itself. Approximate the “unknown” y2 by something, 
say 1, so that the resulting sequence becomes b, = (1, 1,2, 2,4, 4,...). Then apply 
the following procedure to this sequence: A new collection of sequences b“? is 
defined by 


bi = BIE-D + BLES”, (1) 
with b® = 5: 
1 1 2 2 4 4 8 8 
2 3 4 6 8 12 16 
5 7 10 14 20 28 
12 17 24 34 48 
29 41 58 82 
70 99 140 
169 239 


Yuksel observed that the successive ratios b*)/b approximate V2. For example, 
the first few values of this ratio are (rounded to six decimal places) 
1, 1.5, 1.4, 1.416667, 1.413793, 1.414286, 1.414201, while the value of v2 is 
1.4142135... . He wondered how general this phenomenon was, so he came to one 
of us for help. Shun-hui Zhu, a Hedrick Assistant Professor at UCLA, was also 
involved in the ensuing discussion. 

The proof that the ratios approach the desired limit in this particular case is not 
too hard—one simply computes b explicitly in terms of b, (we will do this 
shortly—see (2) below), evaluates the resulting sum, and passes to the limit. 
Prorhpted by Yuksel’s question, however, we were interested in seeing how 
generally this procedure works. It turns out that the proof of the resulting theorem 
involves some elementary versions of basic probabilistic ideas—the law of large 
numbers and a strengthened form of this called large deviations—and thus 
provides us with an opportunity to see these ideas in action in a relatively simple 


This paper is an outgrowth of an undergraduate colloquium given by the second author at UCLA in 
the Winter of 1993. 
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n 
context. Note that in the example above, yb, — 2, so that the theorem below 
applies to this example. 


Theorem. Suppose that b,, is a positive sequence which satisfies 


lim /b, =A > 0. 


nwo 


Define b by (1). Then 


We begin the proof by computing b? explicitly in terms of b,. To do so, simply 
check by induction that 


K(k 
1 = (Fey (2) 
j=0 
The induction step uses the well known identity 
C= (5) Gar] 
.|= +1. 
J J jJ-1 
Let y(n) = b,A~". Using (2) and writing b, in terms of y(j) and A gives 


bi = » {¥ a, » (*) (i) 


j=0 J j=0 J 
and 
k k k k . k+1 k 
bY = [i bs =) ; ae + 1)A*E= De , _ 1} 
j=0 j=0 j=l 


k e 
J k ; 
_ " \ J k+l 
Lie are] ' Jaya +y(k+1)a 


We will prove our convergence statement by obtaining a lower estimate for bf 
and an upper estimate for |b“) — Ab%*|. In order to do s0, it is useful to divide the 
above sums by (A + 1)*, obtaining 


Po r LF jr| . 


a+1* A+] 


k-j 


j 
bes 


and 


Bo YE) ree 
Aarne “k-j+i\i A+1) \a+1 ” (at 1)* 


The key to the proof of the theorem is the observation that these expressions 
can be interpreted as expected values of certain random variables. In order to do 
so, recall that if Y is an integer valued random variable and h is a nonnegative 
function, then the expectation of h(Y) is given by 


Eh(Y) = Vi h(k)P(Y =k). 
k 
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Therefore 


pit 
a n 1)* _ Ey( X;) (3) 
and 
b*) _ k Xr k 
rap ERS e TO MKD EGG] 


where X, is a random variable with the binomial distribution with parameters k 
and p=A/A +1: 


P(X, =j) = [F Joc —p)*”. 


This is the distribution of the number of successes in k independent trials with 
success probability p. 

The Law of Large Numbers says that X, = kp in an appropriate sense. This 
suggests that (3) and (4) are approximately y(kp) and (p/1 — p)y(kp) = Ay(kp) 
respectively, so that the ratio is approximately A as desired. The precise statement 
provided by the Weak Law of Large Numbers in this case is that 
X,, 

k p 
for every e > 0. (See Theorem 8 of Chapter 4 of [2], for example.) This is not a 
strong enough statement to justify the above approximations. We need to know 
how rapidly the probabilities in (5) tend to zero. This information is provided by 
the most elementary form of a large deviation result. 

The theory of large deviations began in the late 1930’s with the work of H. 
Cramér. It was fully developed by Donsker and Varadhan (and others) in the late 
1970’s and early 1980’s. The excellent book [1] tells the whole story. A large 
deviation result says that if A, is a sequence of events with probabilities tending to 
zero as a consequence of a law of large numbers, then these probabilities tend to 
zero exponentially rapidly in the sense that 


>e} =0 (5) 


lim P| 


ko 


1 
jim i log P(A, ) 
exists and is in some sense computable. While this conclusion is certainly not 
always valid, it holds sufficiently often that it is very useful. 

In the present special case, it is not hard to obtain the needed exponential 
bounds on these probabilities without an appeal to the general theory. It is 
interesting to note, though, that the computations below are the starting point for 
the development of large deviation theory. Here is the statement we need in the 
present context: 


Proposition. For every « > 0 there is an r < 1 such that 


P| =p >e} <2r* (6) 
for all k > 1. 
Proof: Take 6 > 0, and write 
eonerop| = —p>e| < Ee%*«= [e%p + ql", (7) 
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(q=1-p). The inequality is a form of Chebyshev’s inequality—we simply 
replace X, by its minimum value (k(p + e)) on the event 


xX 
le? 4 


and replace the nonnegative e°*« by 0 on the complementary event. The equality 
in (7) can be obtained from the definition of expectation and the binomial 
theorem: 


k 

tk , _; 

Ee = yen) oc — p)*" = [e%p + a]*. (8) 
j=0 

Rewriting (7) gives 

X;, 

— —-poe 

k D 

Noting that pe®?-® + qe~°"t® has value 1 and derivative —e at 6 = 0, it follows 

that this function is < 1 for small positive 6. Therefore, for each e > 0 there is an 

r < 1 such that 


P < [ pe%4-9) + ge P+)", 


<r*. 


Xx 
1 rn 


The same argument (with @ < 0) shows that there is an r < 1 such that 


xX 
p| at - p< -« <r*. 


The proposition is obtained by combining these results (and choosing the larger r). 
Returning to the proof of the theorem, the assumption on the b,’s implies that 


n 
vy(7) — 1, so that for any 6 > 0, there are constants c and C such that 


ce~°" < y(n) < Ce™ (9) 
for all n > 0. It follows from (8) and (9) that 
cle®p +q]° < Ey(X;) < C[e’p + q]*. (10) 


Recall that we needed a lower estimate for bf and an upper estimate for 
|b — Ab™|. Referring to (3), we see that the first of these is given by (10). For 
the second, break up the expectation of the expression obtained from (3) and (4) 
according to whether |X,/k — p| < €or > «€ to obtain (assuming (6), (9) and (10) 
hold) 


Ib — rab X, p 
————— < E| ——— — a (%,) + Av(K + 1) * 
(A +1)* koX, 41 1aop|*e) Ave + Dp 
cane | enc ante +e 
7 lu—kp|<ek k-u+t+1 1-p Y k Y D 
u p X, 
+ og > |ecct*”||F - > 11 
osusk|k—u+1 Zo |e 5 p| >} (11) 
<= max Uu ~ aleve) + ark + 1)" 
 ju-kpl<ek| kK —ut+1 l-p YAAK Y Pp 
Pp 
+ max| ———, — ———||2r*Ceké 
1-—p 1-— 
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For fixed e, choose r < 1 so that (6) holds, then choose 6 so that 
er 
= <1 and —— < 1, 
e pn+@q ept+d 
and the corresponding constants so that (9) and (10) hold. Now let k > . Finally, 
let e — 0 in order to see by combining (3), (10) and (11) that 


Ib — r~Ab®| 
— 
tends to zero as required. 

We conclude this note with two remarks. First, there is the issue of rates of 
convergence. In the general context of the theorem, there is nothing that can be 
said about this. However, in the context of the example with which we started, one 
can see by direct computation of (2) that the convergence is exponentially rapid. 


The exponential rate depends on the irrational being approximated; the error in 
approximating A = vm for an integer m is of order a constant multiple of 


A-1)\* 
A+1 


Secondly, one can imagine higher dimensional versions of this approximation 
scheme. For example, one could take a doubly indexed array a,, , = A‘'A’;, take an 
approximating array b,, ,, and replace (1) by 

be) _— bk~ 1) 4 be) 4 b&-) 
m, fl ry 


m,n+l1° 


The analysis is similar, with the binomial distribution being replaced by the 
trinomial distribution with parameters k and 


A, A, 1 
1+A, +A,’ 1+A, +A,’ 1+A,+A,_ 


Under an analogous assumption on the behavior of the nth roots, the result is that 


and 
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Exponentiation in Rings 


R. H. Redfield’ 


1. EXPONENTIAL RINGS. Adding an element of a group to itself may be viewed 
as multiplying the element by two. Analogously, multiplying an element of a ring 
by itself may be viewed as raising it to the second power. Although the theory of 
groups with a multiplication (i.e., the theory of rings) is very well developed, there 
seems to be no axiomatized theory of rings with an exponentiation. The purpose of 
this paper is to suggest a formal setting for such a theory and to determine some of 
the properties of the resulting structures. Underlying the exposition are the 
general references [1, 2]. 

The natural numbers may be used as exponents for all the elements of any 
commutative ring. For the real numbers, there are more possibilities. For instance, 
r® exists for all real numbers e and all positive real numbers r. With this in mind, 
we consider bases and exponents separately for the general case. Specifically, let R 
be a ring, let B (the bases) be a multiplicative subsemigroup of (R, - ) which does 
not contain 0, and let E (the exponents) be a semiring with unit element 1. (That 
is, (E, +) is a semigroup, (E,- ) is a semigroup with 1, and - distributes over + 
from both the left and the right.) A binary operation B xX E > B Cc R ((b, e) — b*) 
makes (R, B, E) an exponential ring if it has the usual properties for exponentia- 
tion (viz., for all b, d € B, and e,k € E, b*d® = (bd)*, b&® = (b*%)*, b°t* = b°b*, 
and b! = b). 

There are many familiar examples of exponential rings. If for any ring R, 
R* = {r|r # 0}, and for any partially ordered ring R, R7= {rl|r => 0}, and R7 = 
{r|r > 0}, then, with respect to the usual exponentiation, (R,R*,Z*) is an 
exponential ring for any integral domain R, and (Ff, F*, Z) is an exponential field 
for any field F. As well, (R, X, Z) and (C, X, Z), where X is any of R7 ,Q*,Q7, 
are all exponential fields, as are (R, R7, Q),(C, R7, Q), (R,R7,R) and (C,R7, R), 
and (Z,, Z7,Z,) with exponentiation 1° = 1 = 1! is a finite exponential field [3]. 
Other examples of exponential rings can be constructed by using products or sets 
of homomorphisms (see §§2 and 3 respectively). 

Still more examples can be built from those above by expanding the semiring of 
exponents. For if (R, B, E) is an exponential ring and S is any semiring with unit 
element, then (R, B, E X S) is an exponential ring with respect to the exponentia- 
tion b®&*) = b*. However, from the point of view of the exponential structure, 


These ideas arose from a discussion with a student, Gary Rosys, following a class in ring theory. 
We were discussing exponentiation as a possible third operation and J remarked that when I was a 
graduate student, one of my professors said that he had once tried to investigate this but without 
success. Gary wanted to pursue the idea and eventually I suggested the definition given here and went 
on to direct his senior thesis [3] on the topic. He stated and proved Proposition 2.2; we worked out the 
example following Proposition 2.2 one afternoon. 
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there is no real difference between the exponential rings (R, B, E) and (R, B, E x 
S). This sort of trivial extension can be eliminated by considering only exponential 
rings which are reduced in the following sense. 


2. REDUCED EXPONENTIAL RINGS. If (R, B, E) is an exponential ring, then 
the binary relation defined on E by letting eKd if and only if b* = b@ for all 
b € B is easily seen to be a congruence relation. If [e] = [d] for e,d € E, then 
b° = b¢ for all b € B and hence the exponentiation b!*! = b¢ is well-defined on B; 
it is easy to check that, with respect to this exponentiation, (R, B, E/K) is an 
exponential ring. An exponential ring (R, B, E) is then reduced if K is the identity 
congruence on E. 

In contrast to the construction described at the end of §1, there is an upper 
bound on the sets of exponents which are available for making a given ring and set 
of bases a reduced exponential ring. Specifically, let R be a ring and let B be a 
multiplicative subsemigroup of R not containing 0. Then Hom(B, B) is a semiring 
with unit element with respect to the operations (e + k)(b) = e(b)k(b) and 
(ek)(b) = (ko e)(b), and (R, B, Hom(B, B)) is a reduced exponential ring with 
respect to the exponentiation b° = e(b). The semiring Hom(B, B) is the largest 
possible semiring of exponents in the following sense. 


Proposition 2.1. Suppose that (R, B, E) is an exponential ring and for e € E, let 
T(e): B > B be the function T(eXb) = b*. Then T is a homomorphism of (E, +, + ) 
into (Hom(B, B), +,°) with kernel K and thus if (R,B,E) is reduced, T is 
one-to-one. 


Proof: If e,k € E and b,d € B, then T(e)bd) = (bd)* = b*d® = T(eXb)T(eXd) 
so that T(e) € Hom(B, B). Furthermore, T(e + k)\(b) = b°** = beb* = 
T(eMb)T(k Xb) = (Tle) + T(k)(b) and T(ek\(b) = b* = (b°)* = (T(k)o 
T(e))(b) = (T(e)T(k)(b), from which it follows that T is a homomorphism. Since 
T(e) = T(d) if and only if b° = T(eXb) = T(d)(b) = b®% for all b, T(e) = T(d) if 
and only if eK d, and hence T has kernel K. a 


We say that (R, B*, E*) extends (R,B,E) if BCB* and there exists a 
one-to-one homomorphism 7,: E — E* such that 7,(1) = 1 and for all b € B and 
e €& FE, b™e©) = b*. In the reduced case, Proposition 2.1 allows maximal extensions 
to be determined as follows. Zorn’s Lemma produces an extension (R, B”’, E) 
such that B” is a maximal semigroup of bases. If (R, B, E) is reduced and 
commutative, then a fortiori (R, B“, E) is reduced and commutative and Zorn’s 
Lemma and Proposition 2.1 together produce at least one reduced commutative 
extension (R,B%,E%) such that E” is a maximal commutative semiring of 
Hom(B%, B’%). (in the noncommutative case, E*= Hom(B%, B%).) We claim 
that‘ (R, B*, E*%) is maximal. For if (R, B*, E*) is a reduced commutative 
extension of (R, B’, E%), then (R, B*, 7,.°7,(E)) extends (R, B%, t,.°7,(E)) 
and by the maximality of B*, B* = B”. Furthermore, since (R, B’, E*) extends 
(R,B*,E*), TCE*) DT (7, E*)) = E%, and hence by maximality of E”%, 
T(E*) = T(1,(E “%)). Then since T is one-to-one, E* = 7,.(E“) and hence rt, 
is an isomorphism. It follows that (R, B*, E “) is a maximal extension of (R, B, E). 

Note that in the commutative case, E “ may not be all of Hom(B%, B“%). For 
suppose that F is a field of characteristic zero with noncommutative Galois group 
G over Q. Then (F, F*, Z) is a reduced exponential field for which F* is a 
maximal semigroup of bases. Restricting 0 in G to F” yields a function o, in 
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Hom(F *, F*), and if oor # 7°o in G, then 7,0, # 0,7, in Hom(F”, F”). So 
E”’ cannot be all of Hom(F”, F™*). It is nevertheless sometimes possible to 
recognize maximal semirings of exponents even in the commutative case. 


Proposition 2.2 [3]. Jf (R, B, E) is a reduced commutative exponential ring for 
which there exists B € B such that {B°\|e € E} = B, then E is a maximal semiring of 
exponents. 


Proof: Suppose that (R, B, E*) is a reduced commutative extension of (R, B, E). 
If z © E*, then B?% = B* = B= for some s € E. But for all b € B, b = B* for 
some e € E and b? = B® = B** = BEC = b7E6). And since (R, B, E*) is re- 
duced, z = 7,(s) € 7,(E). Thus E* = 7,(E). Z 


For example, let R = Z, B= Z~ , and E = Z*[x], and let p, = 2, p, =3,... 
denote the prime numbers in ascending order. For k € B, k = II3_, p(), where 
the e[i] are nonnegative and eventually zero. For n € N, let k*” = [1°_, p#!!, and 
for k € B and f(x) =a, + a,x +--+: +a,x" GE, let kf = Kaotaxt  Fanx" — 
(k%)(k*)% ++. (k*")*, It is straightforward to check that (R, B, E) is a reduced 
commutative exponential ring [3]. But for all i > 1, p,; = 2" and hence B = 
{2/|f(x) € E}. So by Proposition 2.2, E is a maximal semiring of exponents. 

For another example, suppose that R is a commutative ring, that EF is a 
commutative semiring with unit element, and that exp: E — R” is a one-to-one 
function such that exp(x + w) = exp(x)exp(w). If B = exp(E), then it is easy to 
check that (R, B, E) is a commutative exponential ring with respect to the 
exponentiation b* = exp(exp~'(b)x). If b* = b” for all b € B, then in particular 
exp(1)* = exp(1)” and hence x = exp~ (exp[exp~ (exp(1))x]) = exp~ 'Cexp(1)*) = 
exp ‘(exp(1)”) = w so that (R, B, E) is reduced. Furthermore, for all b € B, 
b = exp(1)**? ©, and hence B = {exp(1)*|x € E}. It follows from Proposition 2.2 
that E is a maximal semiring of exponents. 

Proposition 2.2 also allows certain maximal extensions to be recognized. 
For instance, consider (R,R~,R). If (R, B, E) is a reduced commutative exten- 
sion of (R,R*,R) and b €& B, then b?=re€R” and hence b= 5b! = 57x) = 
b7RA)TRI/2) — (h2)7RU/% — ~!/2 © R>. Then B= R” and hence Proposition 2.2 
implies that E = 7,(R). It follows that (R,R~,R) has no nontrivial reduced 
commutative extensions. 


3. ORDERED EXPONENTIAL RINGS. If (2, B, E) is an exponential ring, then 
B is similar to the positive cone of a compatible partial order on R in the sense 
that BB c B. If B is additively closed and cancelative, then B indeed forms such a 
cone. For by hypothesis, BB ¢ B and B + B CB. Furthermore, if x € B N (—B), 
then 0 = x + (—x) € B, a contradiction. Thus B N (—B) = © and therefore B is 
the cone Of strictly positive elements of the following compatible order on R: r <5 
if and only if s-—reéB. 

Sometimes this partial order on R determines a compatible partial order on E. 
For if E is a ring and 1 € B, then for all b € B, b°b' = b! = 1b', and since B is 
cancelative, b° = 1. Thus since b~'b = b° = bb!" B is a partially ordered group. 
But then the order-preserving functions in Hom(B, B) form the positive cone of a 
partial order compatible with both + and -, and thus by Proposition 2.1, if 
(R, B, E) is reduced, E inherits a compatible partial order from Hom(B, B): e < k 
in E if and only if c*~* < d*~* whenever c < d in B. However, since 1*~°1*~* = 
1*~¢ and B is a group, 1*~° = 1. Thus, if 1 < b and c*~* < d*~* whenever c < d, 
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then b° = b°1*~* < b°b*~* = b*, and if c < d and b® < b* whenever 1 < b, then 
ck-* = ¢*(c~!d)*d~* < c*(c~'d)*d~* = d*~°. It follows that the order on E may 
also be described: e < k in E if and only if b° < b* whenever 1 < b in B. 

This alternative description of the order on E also makes sense when E is only 
a semiring and thus we may use it for a general definition. Specifically, define an 
ordered exponential ring to be an exponential ring (R, B, E) for which R is a 
partially ordered ring, E is a partially ordered semiring, 1 © B = R~ , andife<k 
in E, then b° < b* whenever 1 < b in B. 


Examples. The construction above shows that any exponential ring (R, B, E) for 
which £ is aring, 1 € B, and B is additively closed and cancelative, may be turned 
into an ordered exponential ring. As well, if F is a totally ordered field, (F, F ~ , Z) 
is an ordered exponential field, as is (R,R~ ,R). For another example, note that 
(RXR, R* XR*,R*XR) is a commutative exponential ring with respect 
to pointwise addition and multiplication and the pointwise exponentiation 
(r,s) = (r*, s*). Taking R* XR” as the strictly positive cone for both the ring 
and the exponents makes (R X R,R* XR7~,R X R) an ordered exponential ring. 

Note that the underlying ring in the last, non-totally ordered, example is not 
lattice-ordered. This is not unusual. For suppose that (R, B, E) is an ordered 
commutative exponential ring such that E is a ring and B is cancelative. If 
x >0<y in R, then x + y € B, and since (as we observed above) B is a group in 
this situation, (x + y)”>! © B. Then z = xy(x + y)7! © B and hence 0 < z. But 
since y <x + y, y(x + y)7! <1s0 that z <x and similarly (since B is commuta- 
tive) z < y. It follows that R is an antilattice in the sense that the greatest lower 
bound (or least upper bound) of two elements exists if and only if the elements are 
comparable, and thus that in this case, R is in fact totally ordered whenever it is 
lattice-ordered. 

The example above shows that R may be an antilattice which is not totally 
ordered. What conditions force R to be totally ordered? In the case where R is a 
directed integral domain and E is a ring containing 1/2, then R is totally ordered 
if and only if squares in R are positive. For if R is totally ordered, clearly 
squares are positive, and if squares are positive, then in particular (b — 1)? > 0 
for any DEB. Since 1/2 EE, (b-—1)* =z’ for some z &BU {0}. Then 
(b-—1+2z\Xb-—1-—z)=0, and since R is an integral domain, either b — 1 = 
—z<0Oorb-—1=z=2 0. Then B is totally ordered and thus, since R is directed 
and B = R°”, R is totally ordered. 


Order-theoretic interactions between bases and exponents. Requiring that squares be 
positive involves no restriction on the partial ordering of the exponents. However, 
for some ordered exponential rings, order-theoretic properties of the bases are 
related to similar properties of the exponents. For example, recall that a partially 
ordered group is archimedean if g=1 whenever 1 < g” </A for all positive 
integers nm and some h > 1, and suppose that (R, B, E) is a directed reduced 
exponential ring for which E is a ring and B is an archimedean group. We claim 
that E must be archimedean as well. Note first that since B is a group and 
b°b! = b! = b'b® for all b € B, b° = 1 for all b. Thus if b° = 1 for all b, then 
b® = b® for all b, and since (R, B, E) is reduced, e = 0. Hence if 0 < ne < k for 
all n > 0, then b° # 1 for some b. Since R (and hence B) is directed, there exists 
d = B such that 1 <d, b < d and b™! <d, ie., such that 1 < d, 1 < db" and 
1 <db. Then 1=d° <d°. If d*® =1, then 1 = (db“')® < (db~')* = (b"'!)* = 
(b°)~! and 1 = (db)® < (db)* = b® so that b* = 1, a contradiction. If d® > 1, then 
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since (d°)” = d°d”* < d*~"*d”’ = d* for all n > 0, B is not archimedean, also a 
contradiction. It follows that E must be archimedean. 

The opposite relationship sometimes occurs when the exponents are totally 
ordered. Of course, (R x R,R* <XR~*,Z) is an ordered exponential ring with 
respect to the usual exponentiation; so if the order on the exponents is to influence 
the order on the bases (and hence on the ring), the set of exponents must be as 
large as possible. With this (and Proposition 2.2) in mind, suppose that (R, B, E) is 
a directed exponential ring for which there exists 1 < B¢B such that B= 
{B°le < E}. If a,c © B, then a = B* and c = B’, and if E is totally ordered, then 
either x <y or x > y. But if x < y, then a = B* < B” =c; and if x > y, then 
a = B* > B” =c. It follows that B is totally ordered and hence that R is as well. 

Note finally that since the only totally ordered archimedean rings with nontrivial 
multiplications are the subrings of R [1, p 126], the observations above show that 
the only directed reduced exponential rings (R, B, E) whose exponents E form 
a totally ordered ring, whose bases B form an archimedean group, and for 
which B = {B*le € E} for some 1 < B € B, are ordered exponential subrings of 
(R,R7 ,R). 
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Integer Hexahedra Equivalent 
to Perfect Boxes 


Blake E. Peterson and James H. Jordan 


1. INTRODUCTION. A perfect box (cuboid) would be a rectangular parallelop- 
iped with all edges and all diagonals having integer lengths. Even though great 
mathematicians have searched for a perfect box (cuboid) for many centuries, the 
existence of a perfect box is still an unsolved problem. Euler established an infinite 
collection of non-similar ‘good tries’ where only the interior diagonals failed to 
have an integer length. The smallest of Euler’s examples had edges of length 44, 
117 and 240. For further background on the problem refer to Vic Klee and Stan 
Wagon’s recent book [4] or Richard Guy’s book [2]. 

An integer hexahedron is a six faced polyhedron with all edges and all diagonals 
having integer lengths. Since a perfect box (cuboid) would be a special case of an 
integer hexahedron a natural question would be “Are there integer hexahedra?’’. 
This has been answered in the affirmative. In 1988 Heiko Harborth and Anfried 
Kemnitz [3] displayed a very small example, Figure 1, that can be described as 
adjoining the bases of two congruent tetrahedra when their bases are equilateral 
triangles that have side length 3 and the other faces are all isosceles triangles with 
equal sides of length 2. (The interior diagonal is also of length 2). Moller [5] 
provides a description of numerous other examples which are pyramids with a 
cyclic integer pentagon as a base and congruent isosceles triangles as the other five 


Ps 


Figure 1. Double Tetrahedron 
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Figure 2. Small Pentagonal-Based Hexahedron 


faces where the triangles’ equal sides have length an integer larger than the radius 
of the circle that circumscribes the cyclic pentagon. The smallest two of these are 
displayed in Figure 2. The first of the examples has as the base an integer 
pentagon that is a subset of an integer hexagon which was known by Euler. The 
second of the examples has as the base an integer pentagon that is displayed in an 
article by Miller [6]. The other non similar examples involving integer pentagons 
as bases of pyramids are explicitly described by MOller [5]. 

The examples of Harborth and Kemnitz or Moller are not combinatorially 
equivalent to a perfect box since they have only five or six vertices as opposed to 
eight and their faces are triangles or pentagons as opposed to quadrilaterals 
(rectangles). A question about a combinatorial equivalent to a perfect box would 
be ‘Are there integer hexahedra with eight vertices that have all six faces 
quadrilaterals?.”’ It is the purpose of this paper to answer this question in the 
affirmative by displaying numerous nonsimilar examples. 
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Figure 3. Planar Depiction of Figure 4 


Figure 4. Smallest Example 


2. OUR SMALLEST EXAMPLE. Consider two congruent rectangles with sides of 
length 8 and 15 and diagonals of length 17 and four congruent integer isosceles 
trapezoids with parallel sides of length 8 and 15, slant edges of length 7, and 
diagonals of length 13. Place them on a plane in the array of Figure 3. These can 
be folded together into the integer hexahedron of Figure 4 where P;, P,, P,, and P, 
determine the vertices of a planar isosceles trapezoid with parallel sides 8 and 15 
and slant sides 13. 

The distance between the parallel planes of the rectangles is 7/ v2 and the 
interior diagonals are all of length 17. A possible eight points in Euclidean Three 
Space that describes the vertices are: (+4, + 15/2, 7/ V8), (415/2, 
+4, -7/ V8). The volume of the example is approximately 634.39, it has surface 
area about 934.55, its vertices lie on a sphere of approximate radius of 8.85. The 
sum of the lengths of all its edges is 120 and the sum of the lengths of all its 
diagonals is 240. 


3. OTHER EXAMPLES. All of the following examples consist of two congruent 
rectangles and four congruent isosceles trapezoids that can be formed into an 
integer hexahedron in the same manner as the smallest example. 
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Rectangles Trapezoids Interior 


No. sides diag bases sides diag diag 


2 120-119 169 120-119 89 149 191 
3 88-105 137 88-105 199 221 241 
4 280~165 325 280-165 233 317 383 
5 288-175 337 288—175 235 325 395 
6 280-351 449 280~351 93 327 453 
7 352-135 377 352-135 294 366 426 
8 160-231 281 160-231 398 442 482 
9 288-330 438 288-330 217 377 487 
10 360-357 507 360-357 151 389 529 
11 280-351 449 280-351 S01 591 669 
12 432-665 793 432~665 219 579 789 
13 936-75 939 936-75 623 677 727 
14 560~702 898 560-702 309 699 939 
15 432-665 793 432-665 1077 1203 1317 
16 520-1302 1402 520-1302 929 1241 1489 
17 840-1463 1687 840-1463 601 1261 1679 
18 912-1300 1588 912-1300 755 1325 1715 
19 864-1330 1586 864-1330 1509 1851 2139 
20 728-2310 2422 728-2310 1471 1961 2351 


Some points of special interest in these examples should be noted. 


e The rectangles of example 2 are very close to being squares and the slant sides 
are barely tilted. These features make it very close to a perfect box although 
no perfect box could ever have a square face. Are there examples that are 
even closer? 

e The rectangles of example 10 are similar to those of example 2, but the entire 
figure is not similar. The slant sides of example 10 are tilted a little more than 
are those of example 2. 

e Examples 6 and 11 have the same rectangles as faces and 12 and 15 have the 
same rectangles as faces but none of these examples is similar to a smaller 
example. 

e Eleven of the twenty examples have rectangles formed of primitive 
pythagorean triples while the others don’t. 

e It seems strange that only one example has small positive integers for all 
lengths and the others require rather large integers. 


4. LOCATING EXAMPLES. First recall Ptolemy’s Theorem. 


Theorem 1 (Ptolemy). In a cyclic quadrilateral the product of the diagonals is equal 
to the sum of the products of the opposite sides. 


A proof of Ptolemy’s Theorem can be found in Davis [1]. 

For any Pythagorean Triple, (a,b,c), there is at least one integer isosceles 
trapezoid whose parallel sides have the lengths a and b, slant sides of length e and 
diagonals of length d. When two of these rectangles and four of these integer 
isosceles trapezoids are used to build the hexahedron the length of the interior 
diagonals, designated by f, is the only distance which might not be an integer. This 
distance f has the property that its square is the area of the rectangle, ab, added 
to the square of d, the diagonal of the trapezoid. Then it is only a matter of 
checking the Pythagorean Triple and the associated integer isosceles trapezoids to 
see if f is an integer. 
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Essentially start with 
a’ + b* = c?* (Pythagoras) 
then find e and d such that d — e < min(a, b) and 
ab + e* = d’* (Ptolemy) 
and see if there is an integer f such that 
ab + d* =f?. 


Note that if f exists then e*,d’, f? are consecutive terms in an arithmetic 
progression with common difference ab. 


5. REMARKS. Our small example might be the smallest integer hexahedron that is 
combinatorially equivalent to a perfect box but we have not proven that. 

Because of the many symmetries of the examples and the four sides being 
congruent, the length of the interior diagonal had to depend only on one rectangu- 
lar shape and one trapezoidal shape. The interior diagonal of a perfect box must 
depend on the three different rectangles which could be too many restrictions to 
produce an integer. Our work was more manageable than it would be if we 
considered different rectangles and trapezoids or even considered other quadrilat- 
erals. 

We would like to find a parameter that would give us infinitely many non-simi- 
lar examples of this type of integer hexahedron. Euler’s parameter examples all 
failed to yield a perfect box since the one remaining distance was irrational. 
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I tell them that if they will occupy themselves with 
the study of mathematics they will find in it the 


best remedy against the lusts of the flesh. 


—Thomas Mann (1875-1955) 
The Magic Mountain, New York: Alfred A. Knopf, 1927. 
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NOTES 


Edited by: John Duncan 


Calculating Normal Probabilities 


Richard J. Bagby 


For X a random variable with standard normal distribution and a> 0, the 
probability that 0 < X <a is 


P(a) = —— f'e?? as 
V2a Jo 
This is probably the best known example of an integral that cannot be evaluated in 
terms of elementary functions. In this note we develop an elementary approxima- 
tion to P(a) which arises in a natural manner, is not difficult to use, and gives 
excellent results over the entire range 0 < a < o. It is, in fact, more accurate than 
the four-place tables of values of P(a) commonly found in statistics textbooks, and 
simpler than many of the better known approximations (as described in Johnson 
and Kotz [1], for example). 
Our approximation to P(a) is 


Q(a) = *{y- ale” + 16e772-V2) 4 (7 + =a’ Je" | “ 
2 30 4 


a formula resulting from simple variants of well-known techniques for evaluating 
or approximating integrals. 
The first step is to write 


P(a)° 


1 .a >a 
_ —(x2+y?)/2 
= i} i} e dy dx 


1 a x 2 2 
= — | / e~ 2? +¥)/2 dy dy 
7/0 “0 
and then use polar coordinates to obtain 


P(a)’ — = [714 [9° e-? Ardea — 1 _ X [7/4p-a/ryetsecto dé. 
qT 0 0 4 qT /0 
This integral is easier to approximate than the original one, chiefly because the 
integrand is less variable. Indeed, it becomes constant as either a >,0 or a > o, 
and just about any quadrature scheme gives the right values for constant functions. 
Our formula for Q(a) comes from an obscure but extremely effective approxi- 
mation to this last integral. It uses the quadrature rule 


a+b 


1 
['1(2) de ~ 5 (b~ a) 7a) + 164 , } + 7100)] 


1 or 
= (b - a)'[f'(b) - f(a], 
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which is exact for polynomials of degree five or less. It is especially efficient for our 
integral, because f’(0) = 0 and f’(7/4) = —2a7f(7/4), so that only three func- 
tional evaluations are actually used instead of the indicated five. Thus for our 
integrand it is no harder to evaluate than Simpson’s rule, but it is noticeably more 
accurate. 

Like Simpson’s rule, our quadrature formula results when the technique of 
Richardson extrapolation is used to eliminate part of the error inherent in the 
trapezoidal rule; see any numerical analysis text such as [2] for an extensive 
discussion of this topic. Since neither our formula nor its error estimate are widely 
known, we outline a derivation of both that can be extended to develop additional 
quadrature formulas. 

To start our derivation, we need a good representation for the error in the 
trapezoidal rule, one that allows us not just to bound the error but to estimate it 
accurately when f is a smooth function. This requires no more than repeated 
integration by parts, with successive antiderivatives chosen to make most of the 
boundary terms be zero or antisymmetric. We do all this by choosing an even 
polynomial K(x) such that K® = 1, K(h) = 0, K’(h) = 0, and K®(h) = 0. Then 
we have 


f(x) de =[KO(x) f(a) — K(X) F'(2) — KH) F(D)][", 


+ [" K(x) f(x) de 
—h 
for all f € C®. The conditions on K require 
1 1 
720 720 
so that with a translation we obtain the one-step formula 


[°"F( 3) de = AL Fle +) + F(e =] — HLF (6 +) —F'(e 


K(x) = ——(x° — 5h2x4 + Th4x? — 3n%) = —(x? — h?)*(x? — 3n?), 


+ Zh [Fc +h) - fc -h)| 


1 -» 2 
_ (6) 2_ p2 2 _ 322 
+ jf (c +x)(x? — h?)(x? — 3h?) dex. 


We can now use our knowledge of the error to eliminate much of it. Applying the 
one-step rule to the left and right halves separately and adding the results yields 
the two-step formula 


[O"F(2) dx = tal f(e +h) + 2F(0) + fle - h)] 
~Bl[fi(e +h) ~ f'(e~ hy] 
+ aah®[ FO(c +h) — FOC — h)| 


h/2 


1 h h 
+ — Olo-~4+x] +f c+=—+ 
wale | PO(e~ 3 48) #79(6+ 3 + 
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Simpson’s rule (with a representation of the error) comes from subtracting one-third 
of the one-step formula from four-thirds of the two-step one; the second terms on 
the right are then eliminated. Our quadrature rule comes from subtracting one- 
fifteenth of the one-step formula from sixteen-fifteenths of the two-step one, 
thereby eliminating the terms proportional to h*. The terms proportional to h’ are 
not eliminated; they become part of our quadrature formula. 

To subtract the integrals involving sixth derivatives efficiently, rewrite them in 
the form 


[LPO Ce +x) + £Ce ~ x)] p(x) de. 


Symmetry helps; in the integral over [—h/2,h/2] one simply replaces x by 
+(x — sh). This leads eventually to the identity 


fF) dx = qsh{7f(c +h) + 16f(c) + 7f(c — h)] 
—kh?| f'(c +h) —f'(ce- h)] 
| 
+ ap IFC +9) + Fe - 2) 


x(x — h)*(5x? + 4hx +h?) de. 


Since the last integral contains a polynomial with constant sign, the mean value 
theorem for integrals allows us to express it as z5<h’f(€), with € some value 
between c — h and c +h. The quadrature rule we originally stated corresponds to 
h = (b — a)/2 and c = (a + b)/2, and the last integral represents the error. 

What does all this say about the accuracy of approximating P(a) by O(a)? Our 
representation of the error is very hard to use effectively when f(@) = e~@/2# 8°" 9, 
Computing f(6) is a routine but lengthy exercise; a systematic method is to call 
t=tané so that f(0) =p(t)f(@) with p,.,¢) = + tL p(t) — a*tp,(0)]. 
While the individual terms in this expansion are easy to deal with, the alternating 
sum in which they appear is not. Moreover, for many values of a the function 
f©(@) has sign changes on [0, 77/4], so its weighted averages are often significantly 
smaller than its extreme values. All this means that simple bounds for z<h’f (é) 
are a good bit larger than the observed error. Consequently, our statements about 
the accuracy of approximating P(a) by Q(a) are based not on the error formula, 
but instead on a detailed comparison of computed values of Q(a) with tabulated 
values of P(a). 

We found that the error Q(a) — P(a) changes slowly in a and varies from about 
— 0.00003 near a = 0.30 to +0.00003 near a = 1.70; as expected, it vanishes both 
as a — 0 and as a — ~. The formula is remarkably accurate near a = 0; even the 
relative error (Q(a) — P(a))/P(a) remains small as P(a) — 0. We can prove this 
last statement by using the estimate e~‘ = 1 — t + O(t’) for all the exponentials. 
That leads to 


1/2 


+ O(a’), 


a| 170 — 642 -—7 
Q(a)= 5 120 


a 
while P(a) = ae + O(a’) asa — 0. 
T 
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Thus for small a, the relative error is theoretically about —0.0003, although 
computing Q(a) accurately as a — 0 requires some care. 

Still better approximations to P(a) are available by our general method. Simply 
using the same quadrature rule separately on [0, 7/8] and [7 /8, 7/4] and adding 
the results should divide the error by about 64, since for multistep applications the 
error formula shows the error is O(h®). Of course, that requires evaluating f(6) at 
five points instead of three, but f’(@) is still evaluated only at 6 = 0 and 7/4. If 
we are willing to evaluate f(@) at five or more points, we can also use a quadrature 
rule of higher order. But Q(a) already gives all the accuracy ever needed for 
elementary statistics classes. 
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Constrained Critical Points 


Paul Shutler 


1. INTRODUCTION. This note addresses the same problem as that posed in the 
recent paper by Cassell and Rees [1], namely the computation of the index of a 
constrained critical point. They treat it from an algebraic point of view. In Section 
2 we give an analytic treatment of the restricted Hessian which is at the heart of 
the problem. In Section 3 we show how to calculate this restricted Hessian by 
working through the first of the two examples given in [1]. In Section 4 we show 
how the result in [1], on bordered Hessians, fits more naturally into this analytic, 
rather than algebraic, framework. Specifically, by using the rank theorem of 
differential analysis, [2, Section II.7], we can pick a coordinate system in which the 
algebra of [1] is greatly simplified. 

Recall the situation. We have an object function f: R” — R which we wish to 
extremise on a subset M given as the zero locus of another function g: R” > R. 
(Reference [1] treats the case of arbitrarily many constraint functions g,;. We shall 
stick to the case of a single constraint function for the sake of clarity, but 
everything we shall say extends in an obvious way to the general case.) We 
introduce the Lagrangian L = f + Ag and find a point p € M andavalueA ER 
such that Va.L(p) = 0. We would like M to be not just a subset but also a 
codimension one submanifold, so that we can differentiate on it. The easiest way 
to ensure this is to insist that Va.,g # 0 and then apply the rank theorem to g. 
(This is consistent with the Lagrangian approach, since Vaxg(p) = 0 would mean 
that p was already a critical point of the unconstrained object function.) It then 
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follows that V,,(flw)(p) = 0, hence p is a critical point of the constrained object 
function f|,. What kind of critical point is it? 


2. THE RESTRICTED HESSIAN. The answer to this question is contained in the 
proposition below. Roughly speaking, it says that the Hessian of the constrained 
object function f|, is the restriction to the tangent space of M of the Hessian of 
the Lagrangian L. A Hessian matrix, however, is not something which of its nature 
admits of restriction to a subspace. We must first convert it into a bilinear form. To 
understand why this step is problematic, and to explain the need for all the 
subscripts in the statement of the proposition, we must first appreciate the 
difference between these two kinds of objects. 

A Hessian matrix (0°f/0x, 0x ;) fequires a coordinate system x = (x,,...,x,,) in 
order to define it. A bilinear form B on a vector space V does not require a basis 
in order to define it. If C, and C, are matrix representations of B with respect to 
two bases of V then we have C, = A’C,A, where A is the basis change matrix. 
We say that the two matrices are congruent. Thus, a bilinear form is the same as a 
collection of congruent matrices, one for each basis. So, if the collection of Hessian 
matrices of a function f at a point p, one for each coordinate system, is to define a 
bilinear form on the tangent space of the manifold in question, the Hessian 
matrices must be congruent to one another. In general they are not. To see this, let 
x,y be two coordinate systems. Then a simple exercise in use of the chain rule 


gives, 
a°f o (= a*f (dx, >| 7 0°x, 
= —_——__ —__ + —_—__ . 
dy; Oy; kl Oy; ] OX, OX, \ OY; K \ 9X, | OY; 9Y; 


The matrix (0x,/dy,) is playing the rdéle of the basis change matrix A, but the 
second term spoils the congruence relation unless Vf = (0f/0x,) is zero. 

As an example, consider how tempting it is to claim that HA(fly) ~ 
H(f)\7,m@x7,m Where H denotes a Hessian, ~ denotes congruence, and T,M 
denotes the tangent space at p to the manifold M. That this is false can be seen by 
considering the case f(x, y,z) =z, g(x, y,z)=z—x*-—y’, p = (0,0,0). Then 
H(f\lm)(p) ~ diag(1, 1) while H(f)(p) = 0. What has gone wrong, of course, is 
that Vf(p) # 0. This may seem like an elementary mistake, but in a moment we 
shall be making essentially the same claim about L, so it is important to be clear 
about when it is true and when it is not. 

In our constrained extremisation problem we should assume that Vp. f(p) # 0, 
otherwise there would be no point in introducing the Lagrangian. But we do have 
Vulflu)(p) = 0, hence we do get a well defined bilinear form on T,M which we 
write By (flu). Similarly, Vg-L(p) = 0 gives us Bax (L). To write Ban ,(f), 
however, would be to write nonsense, which explains why the subscripts and 
restriction signs are so important. 


Proposition: 


Proof: Choose coordinates x at p such that M is the locus x, =0 and 
(x,,..-,X,— ,) are coordinates on M. That this can be done follows from applying 
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the rank theorem to our assumption that Va.g # 0. Then observe that 
By, p(L|m) = Bar, (L)|z,m, 


since both may be represented by the matrix (0*L /dx, dx j for i,j = 1,...,( — 1). 
Finally, since g|,y = 0, we have L|y = fly. This completes the proof. 


3. EXAMPLE. As an illustration of how to apply the above proposition we shall 
treat the first of the two examples given in [1]. Let 


f=xrt+y+2? g=x 't+y'4ez7!-1. 


(Notice that our choice of g differs from that in [1] by —1 so as to make M the 
zero locus.) The stationary points are then p, = (3,3,3) with A = 243, p, = 
(1,1, —1) with A = 3, and p3, p, symmetrical with p,, that is, with the same value 
of A but with the minus sign 12 the other two spots. 

At p, the Hessian of L in xyz-coordinates is diag(36, 36, 36) which is positive 
definite. Its restriction to any subspace is therefore positive definite too, so we are 
at a minimum point of fly. 

At p, the Hessian of L is diag(12, 12, —12). To construct a basis for T, M 
observe that this space is orthogonal in the Euclidean sense to Va»g(p,) = 
(—1, -—1, —1). A suitable basis is therefore {(1, 1, —2),(-—1, 1, 0)}, but restricted to 
this basis the Hessian becomes diag(— 2,2) so we are at a saddle point of fly. 


4. THE BORDERED HESSIAN. We can view the Lagrangian as a function 
L(x, A) of the Lagrange multiplier A as well as of the coordinates x. Since 
aL. /Ok =g vanishes on M, V ror L p,A) =0 so we obtain a bilinear form 
Brror.cy, (ZL). The bordered Hessian is the Hessian matrix of L. We can go one 
step further in the proof of the above proposition by using the rank theorem to 
choose coordinates x such that g(x) = x,. Then the Hessian matrix of L = f + Ax, 
takes the simple form 


0 |> 
Ko eee wee * 0 1 
0 oe eee 0 1 0 
where the asterisks mark the derivatives 0*f/dx,dx,, i = 1,...,n — 1, and where 


we recognise the upper left hand submatrix as the Hessian of the constrained 
object function f|,. This matrix is similar to the corresponding matrix in [1, 
Section 2]. The difference is that our choice of coordinates gives us many zero 
entries, which allows us to use 


0 O 
I, : 

A= 0 O|> 
0 wee eee 0 1 0 
— ese. me 0 1 


where I,,_, is the (nm — 1) X (n — 1) identity matrix, in a congruence transforma- 
tion A’( )A to eliminate the asterisks. Diagonalising the lower right hand two by 
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two matrix we deduce 


1 O 
0 —-1)’ 
that is, the bordered Hessian is congruent to the constrained Hessian summed with 
a saddle point pair. This is the main result in [1]. Applying it to the above example 
thus requires the diagonalisation of a four by four matrix which, although not 
difficult to do, is rather tiresome. It is worth emphasising again at this point the 
importance of the distinction between Hessian matrices and bilinear forms. It is 
the condition VL(p, A) = 0 which allows us to use bilinear forms to bridge the gap 
between the above rather special coordinates and the coordinates we are given in 
the example. Otherwise it would not be clear that our four by four matrix, 
diagonalised or not, had anything to do with the local behaviour of f|,,. 


Brror.(p.a L) — By, p( flu) ® | 


5. CONCLUSION. We have presented two methods for discerning the nature of a 
constrained critical point. We can construct a basis for the tangent space of the 
submanifold and diagonalise the (n — 1) x (n — 1) restricted Hessian matrix. 
Alternatively, we can diagonalise the (n + 1) X (n + 1) bordered Hessian matrix. 
Which of these two methods is to be preferred is perhaps a matter of taste, 
although in the above example the former turned out to be the easier. 
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A Cone Eversion 


S. Tabachnikov 


In the punctured plane R? — (0,0) two functions are given: f(x, y) = yx* + y’ 


and f(x,y) = — yx? +y*. Their gradients are the constant radial vector fields 
(Fig. 1). Certainly, these fields are homotopic as nondegenerate vector fields (that 
is, they can be included into a continuous one-parameter family of vector fields 
without zeroes in the punctured plane): just rotate each vector through 180°. Can 
one perform such a homotopy in the class of nondegenerate gradient vector fields? 
In other words, can one include the functions f,(x,y) and f(x,y) into a 
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Figure 1. 


Figure 2. 


one-parameter family of smooth functions f,(x, y) without critical points in the 
punctured plane, continuously depending upon the parameter t? The reader is 
encouraged to make his/her own attempt (warning: rotation of the vectors in the 
Same sense does not work—the field in Fig. 2 is not a gradient!) 

Formulate the problem more geometrically. Fix an open annulus in the horizon- 
tal x,y-plane, say, the annulus 1 < ¥x*?+y* <3. It is diffeomorphic to the 
punctured plane, so it is enough to solve the problem in the annulus (indeed, if 
one has a family of functions f, without critical points in the annulus, composing it 
with the diffeomorphism yields a desired family of functions in the punctured 
plane). Consider the function f(x,y) as the height function of a surface S, in 
space, whose projection onto the horizontal plane is the fixed annulus. The 
surfaces S, and S, are cones (Fig. 3); the latter is the “lamp” and the former—the 
“lump” (in analogy with the “cap” M and the “cup” U). What one wants to 
achieve is a deformation of the “lump” S, to the “lamp” S,, so that each 
intermediate surface S, does not have a horizontal tangent plane at any point. 
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Lump Lamp 


Figure 3. 


Here is an example of the deformation in question. The surface S$, is given by 
the equation 


z=8,(a) + 0.25(r — 2)h,(@) 


in the cylindrical coordinates (a, r, z); here 0 < a < 27,1 <r < 3 and the “time” 
parameter ¢ varies from 0 to 4. The functions g and / are: 


g, =tsina, h,=(1-t)+1(0.5 + cosa); t € [0,1]; 


g,= (2-—t)sina + (t — 1)sin2a, h, = cosa + 0.5(2 — t); t € [1,2]; 
g,= —(t(-2)sina+(3-ft)sin2a, h,=cosa—O05(t—2); ¢t € [2,3]; 
g,= —(4-ft)sina, h,= —(t — 3) + (4-1t)(cosa — 0.5); t € [3,4]. 


The reader may (but probably will not) verify that, for all values of t, the function 
z,(a, r) does not have critical points. 

One could stop here; but I believe that I owe the reader some explanations. 
First, the existence of the homotopy in question is a very particular consequence of 
the Gromov-Phillips theorem and the Gromov h-principle theory (see [G] and [H]), 
The proofs in this theory are by no means constructive, so explicit constructions 
are of interest. A famous example is turning a sphere inside out—another 
consequence of the Gromov theory (more precisely, of its predecessor, the Hirsch- 
Smale theorem); see, e.g., [Fr] or the movie under preparation at the Minnesota 
Geometry Center. The problem we are concerned with here was mentioned in [F] 
(and was given to me by my advisor D. Fuchs some 17 years ago; I believe the 
present construction is similar to a somewhat obscure one I produced then). 

Secondly, I should like to explain how the above formulas came up. Since the 
original and the terminal functions are linear in r, it is natural to look for the 
function z, in the form: 


z,(a,r) =8,(a) + e(r — 2)h,(a), 
where g and h are periodic functions and e is a small parameter to be chosen. The 
original “lump” surface corresponds to g)(@) = 0 and h,(a) = const > 0; the 
terminal “lamp”—to g,(a) = 0 and h,(a) = const < 0. It might be instructive to 


think about the surface S, as a sort of closed rope ladder in space, whose axis is 
the curve 


zZ=28,(a), O<a<2zr, r = 2, 
and whose rungs are the radial segments 


z=8,(a) + e(r— 2)h,(a), = const, 1<r<3 
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with the slope of eh (a). So, at the beginning, the axis is a horizontal circle and the 
slopes of all the rungs are positive. At the end, the axis is again a horizontal circle, 
but the slopes of the rungs are all negative. 

What one wants to avoid in the deformation are any points where the axis and 
the rungs are simultaneously horizontal. Thus the functions 


dzg(a dha 
“oh + e(r- 2) KO and h,(a) 
should not have common zeros. If, for some t, the zeros of 
dg (a 
“6h and h,(a) 
are disjoint, then so are the zeroes of 
dg,(a) dh,(a@) 


qa + (7 — 2) — and h,(a) 


for a sufficiently small e (use continuity and compactness of the circle). By 
compactness of the t-interval this e can be chosen uniformly for all ¢ € [0, 4]. 

The strategy is clear now. First, change the shape of the axis of the rope ladder 
(i.e., the graph of g(a@)) into a non-horizontal curve, after which one can safely 
change the slope of the rungs (the sign of h(a)) from positive to negative on its 
non-horizontal segments. 

The graphs of g,(a@) are sketched in Fig. 4. The graphs are drawn in solid or 
broken lines; the former means that h,(q@) is positive, and the latter—that it is 
negative at the corresponding points a. The half-way picture (t = 2) is symmetric 
with respect to the time eversion: t — 2 — t; from that point on one just repeats 
the process backwards (should one call this half-way surface S, the “limp”?). The 
reader is encouraged to use his/her favorite software to visualize the “limp” S,. 
Fig. 5 shows the corresponding homotopy of the gradient vector fields, thus 
answering the original question. 
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Answer to Picture Puzzle 
(p. 22) 


Cleve Moler, the principal creator of MATLAB. 
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THE COMPUTER SCIENCE SAMPLER 
Edited by: Catherine C. McGeoch 


Approximation Algorithms: 
Good Solutions to Hard Problems 


Ran Libeskind-Hadas 


1. INTRODUCTION. Consider a computer network represented by an undirected 
graph where the vertices represent computer nodes and the edges represent links 
between the nodes. Since some of the links in the network may become faulty, link 
testing devices are placed at some of the nodes. A tester at a particular node can 
test all links incident to that node. Since the testers are expensive, however, we 
wish to deploy the minimum number of these devices such that every link is 
incident to at least one node containing a tester. In graph theoretic terms, a vertex 
cover is a subset of the vertices such that every edge is incident to at least one 
vertex in this set. Our objective then is to find a minimum vertex cover. This is 
known as the vertex cover problem. 

The vertex cover problem is one of many computational problems known to be 
NP-complete (see “Turing Machines and Computational Complexity” in the 
January 1994 issue of the Monthly). NP-complete problems can be solved in a 
number of steps that grows exponentially in the size of the problem, but no 
“efficient” algorithms are known for these problems. By “efficient” we mean that 
the number of steps, or time, is bounded by some polynomial in the size of the 
problem. In fact, not only are no polynomial time algorithms known for NP-com- 
plete problems, but the theory of NP-completeness tells us that if a polynomial 
time algorithm is found for any single NP-complete problem, then all NP-com- 
plete problems are solvable in polynomial time. Theoretical computer scientists 
generally believe, but have so far been unable to prove, that there do not exist 
polynomial time algorithms for NP-complete problems. 

Let us reconsider the vertex cover problem. A simple algorithm enumerates all 
possible subsets of the vertices in increasing order of cardinality, and tests each set 
to see if it is a vertex cover for the graph. This process terminates when the first 
vertex cover is discovered. In the worst case, this algorithm will terminate at the 
very last set, since the set of all vertices is certainly a vertex cover. For a graph with 
n vertices, essentially 2” subsets must be considered by the algorithm in the worst 
case. For example, if this algorithm was applied to deploy link testers in a network 
with 100 nodes and the algorithm was executed on a supercomputer capable of 
considering 10/* subsets per second, the computer would require over 40 billion 
years to consider all possible subsets! Since the vertex cover problem is NP-com- 
plete, it is unlikely that a dramatically faster algorithm can be found for this 
problem. 
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What, then, can we do when confronted with an NP-complete problem such as 
the vertex cover problem? One approach is to use heuristic algorithms. These 
algorithms employ simple rules of thumb and, consequently, tend to be very fast. 
However, heuristics do not guarantee that an optimal solution, or even anything 
close to an optimal solution, will be found. A natural heuristic for the vertex 
problem, for example, begins by selecting a vertex of highest degree (that is, the 
vertex with the maximum number of edges incident to it), in this way ‘“‘covering”’ as 
many edges as possible. This step is repeated until every edge is covered. Unfortu- 
nately, there are many graphs for which this heuristic performs quite poorly. In 
fact, for any positive value of a, it is possible to construct a graph such that the 
solution found by the heuristic on this graph is a times larger than the optimal 
solution [8]! It would certainly be much more desirable to have an algorithm that 
finds a vertex cover that is guaranteed to be at most some fixed constant times 
larger than optimal. Such an algorithm is called an approximation algorithm and an 
approximation algorithm that runs in polynomial time is called a polynomial time 
approximation algorithm. The mere existence of polynomial time approximation 
algorithms is somewhat surprising, since we have no efficient way of determining 
optimal solutions to NP-complete problems. Using a number of clever techniques, 
however, researchers have discovered approximation algorithms for many impor- 
tant NP-complete problems. 


2. APPROXIMATION ALGORITHMS. We begin by describing a surprisingly 
simple polynomial time approximation algorithm for the vertex cover problem. Let 
G = (V, E) denote a given graph. The algorithm comprises the following steps: 


1. S is initially the empty set. 

2. While edges remain in the graph, select an edge (u, v) arbitrarily. Add u 
and v to the set S and remove u, v, and all edges incident to these vertices 
from G. 


We claim that when this algorithm terminates, S is a vertex cover for graph G 
and the cardinality of S is at most twice that of a vertex cover of minimum size. 

The first part of this claim is easily established, since at any step the edges 
remaining in G are exactly those edges that are not yet covered by vertices in S. 
To verify the second part of this claim, let E’ = {e,,..., e,} denote the set of edges 
selected by the algorithm. By definition, every vertex cover must include at least 
one of the two endpoints of each of these edges. Observe also that these edges 
have no vertices in common, since once an edge is selected, both of its endpoints 
and all incident edges are removed from G. Therefore, every vertex cover, and in 
particular a minimum vertex cover, must have size at least k. However, the vertex 
cover constructed by this algorithm has size exactly 2k since S consists of both 
endpoints of each edge in FE’. Thus, this algorithm obtains a vertex cover that is at 
most twice, as large as a minimum vertex cover. Finally, it is not difficult to show 
that this algorithm runs in a number of steps that grows polynomially (in fact 
linearly) in the number of vertices and edges in the graph. 

We have demonstrated a polynomial time approximation algorithm that finds 
vertex covers that are at most twice as large as optimal. In fact, our analysis is 
tight: It is not difficult to construct graphs for which this algorithm finds vertex 
covers that are exactly twice as large as optimal. In general, let A denote an 
approximation algorithm and let A(/) denote the size of the solution obtained by 
this algorithm for a particular instance J of the problem. Similarly, let OPT(/) 
denote the size of an optimal solution for instance J of the problem. We define the 


58 COMPUTER SCIENCE SAMPLER [January 


ratio R_,J) by 
A(T) 
OPT(1) 


and the absolute performance ratio R, of algorithm A is defined by 


RAL) = 


inf{r|R,(1) <r, for all instances J of the problem}. 


Is it possible that more sophisticated approximation algorithms for the vertex 
cover problem achieve absolute performance ratios better than 2? The answer is 
indeed ‘“‘yes’’, although surprisingly the best algorithm currently known improves 
this ratio only slightly to 2 — (log log n/2 log n) where n is the number of vertices 
in the graph [2]. Thus, asymptotically, this algorithm is no better than our simple 
approximation algorithm. Generalizing the notion of the absolute performance 
ratio, the asymptotic performance ratio R®, of algorithm A is defined by 


R%, = inf{rlANp, s.t. R4,W) <r, for all instances J of the problem s.t. OPT(J) = No}. 


We now turn to another problem, known as the bin packing problem, for which 
approximation algorithms with much better absolute and asymptotic performance 
ratios are known. In the bin packing problem we are given a finite set of items, 
each with size between 0 and 1. Our objective is to pack these items into unit 
capacity bins, minimizing the total number of bins used. More formally, let 
I = {s,,55,..-,5,}, WI, s; € [0,1] denote the set of items. We wish to partition J 
into disjoint subsets (bins) B,, B,,...,.B, such that Wi,L, . 2,5; < 1 and k is as 
small as possible. 

Like the vertex cover problem, the bin packing problem is NP-complete. Like 
the vertex cover problem as well, a very simple polynomial time approximation 
algorithm for bin packing finds solutions that are at most twice as large as optimal. 
This approximation algorithm, known as the first fit algorithm, operates as follows: 
Select one item at a time in arbitrary order and place this item in the first bin 
which can accommodate it. 

The ratio of 2 for the first fit algorithm follows from two observations. First, we 
show that when the algorithm terminates, at most one of the used bins is half full 
or less. Assume that this is not the case. Then when the algorithm terminates, 
there are two bins B; and B,,i <j, that are each at most half full. Then the last 
item placed in B, clearly has size at most >. Since bin B, has capacity at least 5 
throughout execution of the algorithm, the first fit algorithm would have placed 
this item in B; rather than in B,, a contradiction. Now, letting FF(/) denote the 
number of bins used by the first fit algorithm on a given problem instance J, this 
observation implies that 


FF(/) < ? ys, 


s,;El 


Our second observation is that the total number of bins used in any solution is at 
least the sum of the sizes of all the items. In particular, letting OPT(/) denote the 
number of bins used in an optimal solution, we have 


Y, s,;| < OPT(L). 


s;El 


Combining these two observations, we now have 
FF(I) < 2+ OPT(J) 
and thus Rpp < 2. 
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The above analysis shows that the absolute performance ratio of the first fit 
algorithm is less than 2. In fact, more careful analysis shows that for all instances I 
of the bin packing problem 


17 
FF(I) < Tp OPT) +2 
and that there exist instances J with arbitrarily large values of OPT(/) such that 
17 
FF(/) = Fp (OPT) — 1). 


Therefore, the asymptotic performance ratio of the first fit algorithm AR‘, is in fact 
1.7. Moreover, a minor modification of the first fit algorithm achieves an even 
better performance ratio. The modified algorithm, known as the first fit decreasing 
algorithm is identical to the first fit algorithm except that items are selected for 
insertion in the bins in decreasing order of size. The analysis of this algorithm, 
which is quite long and complicated, shows that this modification results in an 
asymptotic performance ratio of 11/9 [5]. 


3. APPROXIMATION SCHEMES. Do approximation algorithms exist for all NP- 
complete problems? Unfortunately, it appears that the answer to this question is 
probably “no”. For many NP-complete problems, including the infamous traveling 
salesperson problem, it can be shown that the existence of a polynomial time 
approximation algorithm with any fixed performance ratio would imply that 
P = NP, that is, all NP-complete problems could be solved exactly in polynomial 
time. 

On the other hand, for some NP-complete problems we can do even better than 
finding approximation algorithms with fixed performance ratios. For many impor- 
tant problems there exist families of approximation algorithms that allow us to 
obtain performance ratios arbitrarily close to 1 in exchange for increasingly larger 
polynomial time bounds. A polynomial time approximation scheme (PTAS) is a 
family of approximation algorithms {A,le > 0} where for each e > 0, A, is a 
polynomial time approximation algorithm with absolute ratio bound R a, at most 
1 +e. 

Although it is unlikely that PTAS can be found for all NP-complete problems 
(since this would imply approximation algorithms for all NP-complete problems 
and thus that P = NP), it is natural to ask whether they at least exist for all 
problems with approximation algorithms. In a result hailed by many theoretical 
computer scientists as one of the most important in the field in over two decades, a 
group of researchers from Berkeley, Stanford, and Bell Labs showed in 1992 [1] 
that this too would imply that P = NP. Specifically, it was shown that if a PTAS 
exists for any problem in a rich subset of the NP-complete problems known as the 
MAXSNP-complete problems, then P = NP. Among the many important prob- 
lems known to be MAXSNP-complete is the vertex cover problem. 


4. FURTHER READING. Garey and Johnson’s [4] classic text offers an eminently 
readable introduction to NP-completeness, including a discussion of approximation 
algorithms and schemes. Texts by Papadimitriou and Steiglitz [8] and Cormen, 
Leiserson, and Rivest [3] have very good discussions and a number of illustrative 
examples. Motwani’s technical report on approximation algorithm [7] is also 
excellent. Finally, the recent result on the intractability on the hardness of 
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MAXSNP-complete appeared in [1] accompanied by an entertaining story in the 
New York Times [6]. 
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Mathematics, while giving no quick remu- 
neration, like the art of stenography or the 
craft of bricklaying, does furnish the power 
of deliberate thought and accurate state- 
ment, and to speak the truth is one of the 


most social qualities a person can possess. 
Gossip, flattery, slander, deceit, all spring 
from a slovenly mind that has not been 
trained in the power of truthful statement, 
which is one of the highest utilities. 


—§.7T. Dutton 
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THE EVOLUTION OF... 
Edited by Abe Shenitzer 
Mathematics, York University, North York, Ontario M3J 1P3, Canada 


Four Significant Axiomatic Systems 
and Some of the Issues 
Associated with Them 


Stefan Mykytiuk and Abe Shenitzer 


(a) Greek axiomatics and Euclid’s geometry. One of the greatest intellectual 
achievements of the Greeks was the axiomatic method, a method for the system- 
atic discovery of presumably absolute truths based on the application of logic to 
postulates and axioms. Postulates, to the Greeks, were “request(s) that something 
be allowed.” More specifically, they were elementary, presumably obvious, truths 
relating to a particular discourse. (For example, the first of Euclid’s postulates is: 
“A straight line can be drawn from any point to any point.”) Axioms, to the 
Greeks, were elementary, presumably obvious, truths of a general nature. (For 
example, the first of Euclid’s axioms is: ““Things which are equal to the same thing 
are also equal to one another.”) Euclid preceded his postulates and axioms with 
“Initial explanations and definitions” that suggest meanings and images the reader 
should attach to the terms of the discourse. They make it clear that one is dealing 
with abstractions from various physical objects. Euclid’s geometry is the first 
known extensive example of what we now call an axiomatic structure. 

Given some of the uses they made of Euclidean geometry, it is safe to say that 
the Greeks regarded it as a blueprint for the metric relations in the real world. Its 
uniqueness was unquestioned. 


(b) Hyperbolic geometry and some effects of its discovery. Euclid relied on five 
postulate. The fifth of these, a kind of fly in the ointment, is the famous Euclidean 
parallel postulates: 


If a straight line falling on two straight lines makes the interior angles on the same side together 
less than two right angles, the two straight lines, if produced indefinitely, meet on that side on 
which thé angles are together less than two right angles. 


“‘Now whatever else this postulate may be, self-evident it is not, and this was early 
perceived.” [2] The commentator Proclus (Sth century AD) objected to it by citing 
the asymptotic behavior of certain lines (= curves) and asking: “‘May not the same 
thing be possible in the case of straight lines... ?” Attempts to deduce the parallel 
postulate from the remaining postulates and the axioms were undertaken by 
various mathematicians for over 2000 years. All of them ended in failure. 


62 THE EVOLUTION OF... [January 


A variant of Euclid’s parallel postulate is the assertion: If / is a straight line and 
P a point not on /, then the number of straight lines through P that do not 
intersect / is just one. Its negation, obtained by replacing “is just one” by “‘is 
greater than one,” is the so-called hyperbolic parallel postulate. 

Around 1800 a few mathematicians began to experiment with the system of 
postulates and axioms obtained from Euclid’s system by replacing the Euclidean 
parallel postulate by the hyperbolic parallel postulate. The latter is not an 
abstraction from sense impressions but a logical alternative to the Euclidean 
parallel postulate. It was Gauss, Lobachevski and Bolyai who explored the new 
“not-Euclidean” geometry, based on this modified foundation, in greatest depth. 
Lobachevski’s investigations were more varied and extensive than those of either 
Gauss or Bolyai. Lobachevski and Gauss* carried out inconclusive physical experi- 
ments to determine which of the two geometries fitted physical space best, and 
formulated views of geometry radically different from the traditional ones inher- 
ited from the Greeks. Gauss’ view is made clear by the following well-known 
quotations from his letters: 


I come more and more to the view that the necessity of our geometry cannot be proved... . 
Perhaps we shall come to another insight in another life into the nature of space, which is 
unattainable for us now. But until then one must not rank Geometry with Arithmetic which is 
truly a priori, but with Mechanics... (From a letter to Olbers in 1817.) 


It is my deepest conviction that the positions of the science of space and of the pure science of 
magnitude vis-a-vis our knowledge a priori differ greatly; our knowledge of the former has none 
of the complete conviction of necessity (and thus also of absolute truth) associated with the 
latter; we must admit in all humility that while number is the product of our mind alone, space 
has also a reality outside our mind whose laws we cannot completely prescribe a priori. (From a 
letter to Bessel in 1830.) 


Lobachevski’s sophisticated and far less well-known view is made clear by the 
following quotation and comment: 


“In theory, nothing prevents us from assuming that the angle sum of a rectilinear triangle is less 
than two right angles... . The assumption that the angle sum of a triangle is less than two right 
angles is admissible only in Analytics, for measurement in nature does not reveal the slightest 
deflection of this sum from a half circle.” 

This means...that Lobachevski views the generalized geometry as a mental, imaginary 
construction which makes sense only as an analytic generalization. What justifies it is not its 
possible use for purposes of measurement but its usefulness for all mathematics. ... [For him 
its] acceptability ...derives from his view of a mathematical theory as a method. [5] 


The first published accounts of hyperbolic geometry—by Lobachevski in 1829 
and by Bolyai in 1832—had no immediate effect on the work of other mathemati- 
cians. Some of the mathematicians who were aware of the new geometric system 
were inclined to regard it as an aberration rather than as, in some sense, a valid 
alternative to Euclidean geometry. This began to change around 1860, as a result 
of the publication of the correspondence between Gauss and Schumacher. 


*In his Gauss, a Biographical Study (Springer, 1981), W. K. Biihler doubts the claim that Gauss tried 
to determine the angular defect of a triangle determined by three mountain peaks (see p. 100). 
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One of Gauss’ letters referred to his abiding interest in, and contributions to, 
hyperbolic geometry and to Lobachevski’s masterly development of that geometry. 
Coming from Gauss, this letter generated a wave of interest in hyperbolic geome- 
try. This interest was a key factor that ushered in a series of momentous 
discoveries and ideological changes not only in geometry but in all of mathematics. 
All this occurred in the short period between 1868 and 1872. 

In 1868 Beltrami, who had familiarized himself with the work of Lobachevski, 
used the methods of differential geometry to establish the then surprising result 
that the intrinsic geometry of the pseudosphere, a surface with constant negative 
curvature, is locally hyperbolic. While doing this he also introduced an incomplete 
model of the hyperbolic plane in the interior of the unit disk. The gaps in 
Beltrami’s model were filled in 1871 by Felix Klein, who arrived at his disk model 
along a projective route. The Beltrami-Klein disk model showed that hyperbolic 
geometry is as consistent as Euclidean geometry. As a result, the status of Euclidean 
geometry as a unique system of absolute geometric truths was destroyed once and for 
all. 

The multiplicity of systems called geometries—Euclidean, projective and hyper- 
bolic geometries, the geometries of surfaces in space, the geometries introduced by 
Riemann—gave rise to the question of what is a geometry. In his Erlangen 
Program of 1872 Klein gave a comprehensive (but not all-embracing) answer to 
this question by defining a geometry as the totality of invariants of the subsets of a 
set with respect to a group of permutations of that set. 

“Without knowing of Klein’s work, Henri Poincaré expressed similar ideas in 
1880. He too was interested in [hyperbolic] geometry, and was aware of its 
usefulness in connection with the theory of differential equations. He picked up 
Beltrami’s idea that on [a surface of constant curvature] it is possible to move 
figures without deforming them, and added that these motions form a group. 
Partly because of its visionary imprecision, his paper had a tremendous impact; it 
made the role of groups in geometry known far and wide.” [6] Some twenty years 
later, the demonstrated importance of groups in Galois theory, in geometry, and in 
analysis paved the way for group theory as a distinct area of mathematics. 

Hyperbolic motions play a vital role in the theory of automorphic functions, 
initially developed in the 1880s by Klein and Poincaré (see Chapter 1 in [7] and the 
paper [8] (which also discusses recent work involving so-called hyperbolic mani- 
folds)). They also play a key role in 4-dimensional Minkowskian geometry, the 
mathematical setting of the special theory of relativity first presented by Minkowski 
in lectures in 1905. Specifically, the group of motions of H? (= hyperbolic 
3-space) is isomorphic to the group of homogeneous motions (= homogeneous 
Lorentz transformations) of Minkowskian 4-space (see Chapter 7 in [9]. 

As a postscript to this account of the effects of the discovery of hyperbolic 
geometry we might add that it had a liberating effect not only on mathematics but 
also on mathematicians. As H. Weyl put it, “the individual mathematician feels 
free to define his notions and to set up his axioms as he pleases.” 


(c) Peano’s axioms and “the greatest intellectual discovery of the 20th century.” 
In the late 19th century, mathematicians managed to axiomatize arithmetic, and 
therefore, in a sense, all of mathematics. The first such axiomatization was 
achieved by Dedekind in 1888. Peano, working independently, published his 
clearer axiomatization of arithmetic a year later. This was the last triumphant step 
in a kind of “backward development’”—from the complex to the simple. Specifi- 
cally, in the 1830s Hamilton gave a rigorous definition of the complex numbers in 
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Descriptions of Some Technical Terms 


‘Cuts. Split the rational numbers into two nonempty classes A and B such that every 
element of A is less than every element of B and such that B has no least element. Every such 
pair (A,B) is called a (Dedekind) cut. 

There are natural definitions of addition and multiplication of cuts that make them into an 
isomorphic replica of the real number system. 


*Cardinals. The cardinal number |A| of a set A is, in some sense, a measure of its size. In 
fact, in the case of a finite set A, | A] is just the number of its elements. 

If there is a 1-1 correspondence between sets A and B, we write |A| = |Bl. If A is finite or 
|A| = |N|(where N denotes the natural numbers), then we say that A is countable. Otherwise 
A is uncountable. 


3 and > Well-ordered sets and ordinals. For some ordered infinite sets, the natural numbers 
suffice to describe the positions of the elements. For example, in the usual ordering 1,2,3,...of 
the natural numbers, each number is both an element of the ordered set and a description of its 
position in the ordering. 

Now consider the ordering 1,3,5,...,2,4,6,...of the natural numbers. Here we run out of 
natural numbers after describing the positions of the odd numbers (1 is in the first position, 3 in 
the second, and so on). Cantor proposed the symbol w for the position of 2-w + 1 for the 
position of 4, and so on. 

Other orderings of the natural numbers led Cantor to introduce still more order symbols. For 
the ordering 3,6,9,...,1,4,7,...,2,5,8... (that is, first all numbers of the form 3k, then of the 
form 3j + 1, and finally of the form 3i + 2) he used w - 2 for the position of 2, w - 2 + 1 for the 
position of 5, and so on. 

Each of the orderings of the natural numbers which Cantor considered has the property that 
every nonempty subset has a least element. (Note that the integers with their usual ordering do 
not have this property.) He called ordered sets with this property well-ordered, and the new 
symbols he introduced to describe position in such orderings of the natural numbers, countable 
ordinals. 


‘The continuum is the set of all real numbers. 


®The axiom of choice states that given a family of nonempty disjoint sets, a set can be 
constructed containing exactly one element from each set in the family. 


terms of the reals, and in the 1870s Dedekind defined the reals as cuts! in the 
system of rationals, the field of quotients of the integers. Modern textbooks usually 
reverse the historical process and go by rigorous steps from the realm of the 
discrete to the realm of the continuous, from the natural numbers to the real and 
complex numbers. 

A remarkable insight into the nature of the system of Peano’s axioms, and 
therefore of mathematics, was achieved by Kurt Godel in 1931. To describe it, we 
begin with certain preliminaries about systems of axioms. 

We want a system of axioms to be consistent, that is, free of contradictions. If an 
axiom is implied by the other axioms then we can dispense with it, so it is natural 
to require each axiom to be independent of the others. Another property of a 
system of axioms is its completeness. This means that we have enough axioms to 
decide the truth or falseness of each statement of the system. 

While we would like to know that we are working with a consistent system of 
axioms, we don’t always strive for completeness; for example, the usual group 
axloms are not complete. On the other hand, it would be nice to know that 
Peano’s axioms, the usual axiomatic basis of arithmetic, form a complete axiom set. 
This brings us to what is arguably the greatest intellectual discovery of the 20th 
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century, namely the Géddel Incompleteness Theorems. They were discovered in 
1931 by the 25-year-old Kurt G6ddel who proved that 


For any consistent and finitely axiomatizable formal system F which contains the natural 
number system [with + and - ] there are undecidable propositions in F. One such undecidable 
proposition is the consistency of F. 


F. de Sua described these remarkable insights in the following witty manner: 


Suppose we loosely define a religion as any discipline whose foundations rest on an element of 
faith, irrespective of any element of reason which may be present. Quantum mechanics for 
example would be religion under this definition. But mathematics would hold the unique 
position of being the only branch of theology possessing a rigorous demonstration of the fact that 
it should be so classified. [3] 


The one island of presumed certainty of human thought was proved uncertain. 


(d) The Zermelo-Fraenkel axiomatization of set theory and Paul Cohen’s indepen- 
dence results. What we refer to as naive (= pre-axiomatic) set theory was greatly 
advanced by Georg Cantor between 1872 and 1897. Unlike the post-Zeno Greeks, 
Cantor accepted the actual infinite without hesitation. He made sets the ultimate 
components of all things mathematical and provided a calculus of sets of arbitrary 
“size.” By the end of the century his results enjoyed wide acceptance. Then came 
the difficulties in the form of paradoxes (Burali-Forti, Russell, and others) and 
such seemingly intractable problems of set theory as the problem of the continuum 
hypothesis (is the cardinality? of the set of countable ordinals* equal to the 
cardinality of the continuum?*) and the problem of well-ordering” the continuum. 
These two problems troubled most mathematicians more than the paradoxes, 
which they viewed as somewhat esoteric difficulties. 

The question of well-ordering the continuum was solved in 1904 by Zermelo, 
who showed that if one accepts the axiom of choice,° then all sets can be 
well-ordered. But the axiom of choice had “side effects’”—it led to various 
paradoxical subdivisions of figures (for example, a ball can be subdivided into five 
pieces (one of which is a single point) which can be reassembled into two balls 
each congruent to the original ball). The problem of the continuum hypothesis 
remained intractable. 

Many of the logical difficulties associated with Cantor’s set theory were over- 
come as a result of Zermelo’s axiomatization, introduced by him in 1908 and later 
refined by A. Fraenkel, T. Skolem, and Zermelo himself. While its consistency is 
unprovable (because it effectively includes Peano’s axioms), it is accepted by most 
mathematicians as a foundation for all mathematics more basic than Peano’s 
axioms. The axiom of choice is now generally accepted. The continuum hypothesis 
remains open. In 1938, Gddel showed that these two are consistent both with each 
other and with the other axioms of set theory. In 1963 Paul Cohen did the same for 
their independence. This means, among other things, that mathematicians are free 
to adopt different mathematics! [4] 


(e) A summary. It is useful to juxtapose key past and present views. 

Until the discovery of hyperbolic geometry it was thought that postulates and 
axioms are abstractions from experience and, together with their logical conse- 
quences, are at least approximately true of certain objects in the real world. 
Consistency of the postulates and axioms was taken for granted. These were “gut 
feelings’ as well as “‘official” views. 
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The discovery of hyperbolic geometry initiated revolutionary changes in these 
views of a factual as well as of a philosophical nature. There are now many 
axiomatic systems, including a whole hierarchy of set theories. The “official”? view 
of postulates (or axioms—we now use the terms interchangeably) is that they are 
assumptions about some undefined primitive terms, hence results based on them 
are relative logical truths devoid of any outer physical meaning. The consistency of 
mathematics, whether we base it on Peano’s axioms or on the Zermelo-Fraenkel 
axioms, iS in principle unprovable. Just as the discovery of the independence of the 
parallel postulate split geometry in two, so too, more than a century later, the 
discovery of the independence of the axiom of choice and the continuum hypothe- 
sis from one another and from the remaining axioms of set theory split mathemat- 
ics. So much for facts and ‘“‘official’’ views. Now we come to feelings. 

It is safe to say that almost every mathematician is a least a “residual Platonist,”’ 
and this makes him more or less the intellectual brother of the ancient Greek 
mathematicians and an “emotional” opponent of formalism. Dieudonné described 
one variant of this syndrome in the following words: 


On foundations we believe in the reality of mathematics, but of course, when philosophers attack 
us with their paradoxes we rush to hide behind formalism and say “mathematics is just a 
combination of meaningless symbols... .” Finally we are left in peace to go back to our 
mathematics and do it as we have always done, with the feeling each mathematician has that he 
is working with something real. This sensation is probably an illusion, but it is very convenient. 
That is Bourbaki’s attitude toward foundations. (Quoted in [1].) 


(The Platonism of other prominent mathematicians is more robust than that of 
Bourbaki.) 
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PROBLEMS AND SOLUTIONS 


Edited by: 
Richard T. Bumby, Fred Kochman and Douglas B. West 


Proposed problems should be sent to the MONTHLY PROBLEMS address given on 
the inside front cover. Please include solutions, relevant references, etc. Three copies 
are requested. 


Solutions of published problems should arrive before June 30, 1995 at the MONTHLY 
PROBLEMS address given on the inside front cover. Solutions should be typed with 
double spacing, including the problem number and the solver’s name and mailing 
address. Two copies suffice. A self-addressed postcard or label should be included if 


an acknowledgement Is desired. 


An asterisk (*) after the number of a problem, or part of a problem, indicates that no 
solution is currently available. Partial solutions will be useful in suchcases. Otherwise, 
the published solution is likely to be based on a solution which is complete and 
correct. Of course, an elegant partial solution or a method leading to a more general 
result is always useful and welcome. In addition, references to other appearances 
of MONTHLY problems or to solutions of these problems in the literature are also 
solicited. 


PROBLEMS 


10424. Proposed by Ira Gessel, Brandeis University, Waltham, MA. 


y ok n ("3 ) 
o<ken/3 n—k\ 2k 


10425. Proposed by Allen Barnes, Queensborough Community College, Bayside, NY. 


Evaluate the sum 


A circle of radius r is centered at the point (c, 0). Whether or not the sine wave y = 
A sin(wx +5) hits the circle (i. e., touches or passes through it) depends on the values of r, c, 
A, w and b. Suppose that A is much larger than r and that b is chosen uniformly at random 
between 0 and 27. Find the asymptotic behavior as r — 0 of the probability of a hit. 


10426. Proposed by Noam Elkies, Harvard University, Cambridge, MA, and Irving Kaplansky, 
Mathematical Sciences Research Institute, Berkeley, CA. 

Show that any integer can be expressed as a sum of two squares and a cube. Note that 
the integer being represented and the cube are both allowed to be negative. 
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10427. Proposed by George Soules, CCR/IDA, Princeton, NJ. 


Let A be ann by n positive semi-definite Hermitian matrix. Write A = L+ D+ L* where 
L is lower triangular with zero diagonal, and D is the diagonal of A (and L* is the complex 
conjugate transpose of L). If det(D) + 0, show that all n roots of det(zL+zD+L*) = Olie 
in the unit disk |z| < 1. Also, determine when this polynomial can have a root with |z| = 1. 


10428. Proposed by Jet Wimp, Drexel University, Philadelphia, PA. 


Let an,@n be positive constants with 


oo 1 
San convergent, and @, = o( ) 


a log n 


Show that >, an ®” converges. 


10429. Proposed by Erwin Just, Bronx Community College (Emeritus), Bronx, NY. 
Let p =1 (mod 4) be aprime. Set 


p-l 4 
1 A 1 oC 
jk 2 d - = — 
dS T= Re md DED 


with A, B, C and D integers, and gcd(A, B) = gced(C, D) = 1. 
(a) Prove that p|A if and only if p|C. 
(b) Obtain an analogous result for p =3 (mod 4). 
(c) Find examples to show that these results are not vacuously true. 


10430. Proposed by Fred Galvin, University of Kansas, Lawrence, KS, and John Isbell, SUNY, 


Buffalo, NY. 
Let D(a;,..., ax) denote the sum of the absolute deviations of the real numbers 
aj,..., a, from their median. Call a sequence balanced if the n — 1 quantities 


D(aj,..., an) + D(agai,.--,4n) (O<k <n) 


are all equal. 

(a) Show that, for each integer > 1, anonconstant balanced sequence of 1 terms exists, 
and is unique up to an affine transformation. 

(b) Characterize the positive integers n for which there exists a strictly increasing balanced 
Sequence of n terms. 


NOTES 


(10428) Recall that ¢, = O( fn) means that there is a constant K such that |¢,| < K | f,| for 
all but finitely many vn (thus allowing a sloppy definition of f,,). The limits of summation, 
from 1 to oo, have not been written in the second sum, but this should cause no confusion. 
(10430) For given a1, ..., ax, the quantity ar |a; — x|1S minimized when x is the median 
of the a;. Some examples of balanced sequences are: (length 5) 0, 2,3, 4, 6; (length 6) 
0,1, 2, 2,3, 4; (length 15) 0, 8, 12, 12, 14, 14, 14, 15, 16, 16, 16, 18, 18, 22, 30. 
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SOLUTIONS 


Special Perfect Numbers 


10230 [1992, 570]. Proposed by Peter L. Montgomery, University of California, Los Angeles, 
CA, and John L. Selfridge, Northern Illinois University, DeKalb, IL. 


Find all perfect numbers of the form n” + 1, where x is a positive integer. 


Solution by Douglas Iannucci & Graeme L. Cohen, Temple University, Philadelphia, PA. 
The only solution is 28 = 3° + 1. Let N =n" +1. 

Suppose first that n is odd, so N is even. Euler proved that even perfect numbers have 
the form N = 24—!(24 — 1), where 2% — 1 is prime. Since 

n° +1=(n41)(n"!—n"®*4...-n +1), 
and these factors are relatively prime, the even number n + 1 must be the factor 27~', and 
therefore 2° — 1 = 2(n+ 1) — 1. This implies n” + 1 = (n+ 1)(QQn+4+1) = 2n* +3n +1, 
the only solution of which is n = 3. 

Next, suppose that n is even, so N is odd. Then n” is a square and n” = —1 (mod N). 
Thus p = 1 (mod 4) for any p|N. In particular, N is not divisible by 3. Thus, by J. 
Touchard, “On prime numbers and perfect numbers”, Scripta Math. 19 (1953), 35-39, 
N =1 (mod 12) and 6|n. 

Write N = x°+1, wherex =n"/© > 1. This factorsas N = (x2+1)(x4—x?+1). Since 
x4 — x?41 =3 mod (x74 1) and 3 divides x, these factors are relatively prime. As usual, 
let o(m) denote the sum of the divisors of m. The perfection of N implies o(N) = 2N. 
Since o (m) is a multiplicative function, we have 

2N =o0(N) =0(x? + 1l)o(x4 — x? +1). 
Since N is odd, one of the factors on the right must be odd. This implies that either x? + 1 or 
x4—x241 isasquare. However, x” < x*+1 < (x+1)* and (x?—1)? < x4—x241 < (x?)?, 
so there are no solutions in this case. 

Editorial comment. This result for even N is explicitly mentioned in A. Makowski, “Re- 
mark on perfect numbers”, Elem. Math. 17 (1962), 109. 

The solution of John P. Robertson was similar to the selected solution, while Carl Pomer- 
ance and Anatoly Izotov took a different approach to the case of even n, writing n = rs 
with r = 2' and s odd. The case s = 1 is ruled out by a separate argument. Otherwise, 


(n’ + 1) is a proper divisor of N, relatively prime to its complementary factor. As above, 
this leads to the equation 


x+1 
The solutions to this equation are known (see W. Ljunggren, “Noen setninger om ubestemte 
likninger av formen a = y4?”, Norsk. Mat. Tidsskrift 25 (1943), 17-20) and none satisfy 
the other conditions of this problem. 

At the 1990 Western Number Theory Conference, John Selfridge raised this question 
for odd N, having already solved it for even N. During the conference, Peter Montgomery 
solved the problem. This led to both being listed as proposers. As we have seen, the key step 
is to show that 3|n when x is even. Peter Montgomery’s proof was based on the observation 
that N =2 (mod 3) ifn is even and 3 { n, and hence, for every d|N, one hasd+ N/d =0 
(mod 3). This leads to 3|o(N), contradicting the assumptions. 


Solved also by A. Izotov (Russia), C. Pomerance, and J. Robertson. Seven incomplete or incorrect solutions were 
received. 
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Applications of a Convergence Test for Fourier Series 


10236 [1992, 571]. Proposed by M. J. Pelling, University College, London, England. 


(a) Let f € L'(R) have period 27. Suppose that, for a given x and s, the junction 
o(u) = f(x+u)+ f (x—u) is differentiable in an interval (0, 5), and that lim,-+9 @(u) = 
and lim,-.9 ud’ (u) = 0. Prove that the Fourier series for f converges to s at x. 

(b) Give an example for which the test in (a) succeeds while de la Vallée Poussin’s test 
(and a fortiori Jordan’s and Dini’s tests) fails. 

(c) Let f(x) = }> cnx” be a real power series such that °c, converges. By Abel’s 
theorem, it follows that f is continuous on [0, 1]. Construct an example where f(x) fails 
to be of bounded variation on [0, 1]. 


Solution by the proposer. The three parts will be dealt with in order. 

(a) In proving convergence, there is no loss of generality in assuming s = 0, since the 
general case reduces to this on replacing f(x) by f(x) —s. 

Replace 5 by a smaller value if necessary, so that @(u) and H(u) = u¢’(u) are bounded. 
In particular, let |H(u)| < K and |@(u)| < K for 0 < u < 56. Givene > O choose 
0 <a < dsothat |H(u)|, |\p(u)| < € in (0, a]. Consider the Dirichlet integral, F(p) = 


) dg 
i Sl ou a | + 2 [= SPH ) du 
) 
-2/ sin’ sin 3p 72 Pb! (u) du 
Plo 
1 — cos pd pal? sinry 7 75, dy 
=— oo] 2 (o(%) - H(%)) ae 


+ [OS (6(3) (2) 


on substituting u = 2v/p. Since {5° (sin? v) / v* dv = 1/2 it follows that 


1 — cos pd p8/2 sin? y 
F(p) < po} (—S*) lp(d)| + ex +2K | 0 2 du<4e for p> po(e). 
pa 


So lim. F(p) = 0 and the Fourier series for f(x) converges to s at x. 
p-> 


sin ne 


(b) If p(u) = 


w(t) = 1 hy o(u) a is not BV (of bounded variation) in any interval (0, a], hence de la 
Vallée Poussin’s test fails. 
sin(v + y) e-Y dy 


Settingt =e ",u=e"%,0<y<w, vt) = WW) = [- 
0  (rty) 


and it must be shown that 7% (v) is not BV in any interval [A, oo). But if v = 2kz,k > 1, 
it is easy to see that ¥(2k7) > B,/k for an absolute constant B; > 0, and similarly 
Wi ( (2k + 1)z ) < —B,/k for a constant Bz > 0. So the total variation of yw (v) in any 


n (0, 5] the conditions of the test are met but we show below that 


‘©, @) 

interval [2kzr, 00) is not less than }> (B, + Bz)/n = +00 and hence 7 (v) is not BV in 
n=k 

any interval [A, oo). 

(c) We prove a theorem from which examples are easily constructed. Let h(z) be analytic 
in | 3(z) | < 2/2, R(z) < log2 and satisfy limz_, 9. h(z) = limz-499 h'(z) = 0 as z > CO 
in this half-strip (i.e., 3t(z) — -—oo). Set f(z) = h(log(1 — z)), f(1) = 0, which is 
defined and continuous in the disc |z| < 1, and also analytic in |z| < 1 save for the point 
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‘©, @) 
z = 1. Hence in |z| < 1, f(z) admits a power series representation f(z) = > cpz” 
=0 


where, by uniform continuity of f(z) on |z| < 1, cp = 5 > 7 f(e?)e—™9 dO, n > 0, and 
C_p»=0= 5 a f (e!?)e"® dO, n > 1, are the Fourier coefficients of f (e!”). 
id 


OO ° 
Theorem. The Fourier series )~ cne™® converges to f (e'®) at all points of |z| = 1, and in 


n=0 


OO 
particular f (1) = > Cn = 0. Furthermore, 


n=0 
1©,@) 1 1©,@) 
f(x)= So cnx" € BV[0,1] <> | | f’(x)| dx <oo <> i |h’(—x)| dx < OOo. 
n—0 0 0 


OO . ° 
Proof. By analyticity of f on |z| = 1,z 4 1, it follows )~ cne™? = f(e!’) for 6 ¢ 0. 
n=0 
Near 0 = 0 consider @(0) = 
fel) + fem) — 2F @) = h(log (1 - e'?) ) + A(Iog (1 - c=”) ) 


= h(— | log (2sin 5) | — i(5 — 3)) + 4(— | log (2sin 5) | +i(5 - 3). 
for 6 > 0. Apply the test of part (a) to ¢(@). Clearly 6(0) — Oas 6 — O since 


h(z) > Oasz > ov. Also, 09/(6) = 5 (Ocot$ + id)h'(—|log2sin § | — i(% — $)) + 


1 (Ocot § — id)h'(— | log 2 sin § | +i(% — 5)) — O0as 0 > Osince h’(z) > 0asz > ow. 
©, @) 


So the test applies and > cne"? = f(e!’) also at 0 = 0. 


n=0 
It is well known that if f(x) is differentiable in (0, 1) then f € BV[O, 1] if and only if 
Jo |f’@)| ax < 00. Since 


1 1 1 / oe) 
[ relax = f [ra-xn|ax= [ OT ay = f Ln'(—x) | ax 
0 0 0 x 0 


the theorem follows. 

Solutions to part (c) follow on taking h(z) as any function satisfying the above conditions, 
with {o° | h’(—x) | dx = oo, and putting f(x) = A(log(1 — x)). For example, h(z) = 
(sin z)/z, f (x) = (sinlog(1 — x) ) / (log(1 — x) ) serves since 


‘©, @) ©, @) ©, @) cos x ‘©, @) 
i jh'(—x)| dx > | Jha) | dx f | jax - | 
0 1 1 x 1 


No other solutions were received. 


sin x 
x2 


dx = +00. 


Almost Equidistant Vertices 


10269 [1992, 958]. Proposed by D. M. Bloom, Brooklyn College, CUNY, Brooklyn, NY. 


Prove that there is a constant K < 1 with the following property. Let S be a regular 
(2m + 1)-gon inscribed in the unit circle, and let any point P € G be given, then there are 
distinct vertices Vo and V; of G, such that |d(P, Vo) — d(P, V;)| < K/m. 


Solution by Robin J. Chapman, University of Exeter, Exeter, U. K.. For convenience puta = 
2 /(2m + 1). Choose Cartesian coordinates with origin at the centre of the circumscribing 


circle of S and such that P has coordinates (—a, 0) where 0 < a < 1. Now, there exists 
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a vertex Vo of G having coordinates (cos 0, sin@) with |9| < a/2. By symmetry we may 
assume that 9 > 0. Let V be the vertex of S having coordinates (cos(6 — a), sin(@ — a) ). 
Note that a/2 < |@ —a| <a. As 


d( P, (cos d, sind) )’ =a* + 2acos@+ 1 
is a decreasing function of |@| when 0 < |@| < z then 
0 < d(P, Vo) — d(P, Vi) < d( P,(i, 0) ) — d( P, (cosa, sin @) ) 


=a+1-—VJa*+2acosa+1 = f(a) 


say. Now 
a+cosa@ 


Jaz + 2acosa +1 


which is positive when a = 0 and can only vanish if (a + cosa)? = a* + 2acosa + 1. 
This is impossible as sina 4 0. Thus, f’(a) > 0 for all a € [0,1]. Hence if m > 1, then 
f(a) < fQ) =2-—VJ24+2cosa = 2(1 —cos(a/2)). Nowasm > 2,a/2 < 2/5 < land 
so 1 — cos(a/2) < a?/8 (first term of Taylor expansion, which is an alternating series with 
terms decreasing in absolute value). Hence |d(P, Vo)—d(P, Vi)| < mw /(2m+1)* andas the 
function m +» m/(2m-+1)? is decreasing form > 1thenifm > 1, |d(P, Vo)—d(P, Vi)| < 
L/m where L = 222/25 < 1. 

It only remains to consider the case when m = 1. Divide the triangle S into four 
equilateral triangles by drawing lines between the midpoints of the sides. If P lies inside 
the central triangle take Vo and V; to be any distinct vertices of §. We may assume that 
d(P, Vo) = d(P,V,). Now d(P, Vo) < 3/2 (when P is the midpoint of the side of G 
opposite Vo), and d(P, V,) => 3/4 (when P is the midpoint of the side of central triangle 
nearest V;. Hence |d(P, Vo) — d(P, Vi)| < 3/4 < 1. Now if P lies in the small triangle 
containing the vertex V of S let Vo and Vj; be the other vertices of G and assume that 
d(P, Vo) > d(P, V;). Now d(P, Vo) < V3 (when P = V), and d(P, Vj) = J3/2 (when 
P is the midpoint of V V;. Hence |d(P, Vo) — d(P, V;)| < J/3/2 <1. In general 


f(a) =1- 


l\d(P, Vo) — d(P, Vj)| < K/m 


where K = max(222/25, 3/4, /3/2) = /3/2. 

Note that the analysis for m > 2 does not give the best possible result as this method 
works for all P inside the unit circle, and not just for P in S§. One can get a slightly better 
result with a Jot more work. Note that the bound obtained is O(m~?). Similarly, the result 
for the triangle could be improved by maximizing |d(P, Vo) — d(P, V;)| over a suitable set 
instead of working with the individual distances. 


Editorial comment. As noted when the problem was proposed, this problem sought to 
sharpen an upper bound of the form 1/m — A/m? found in Problem A-5 on the 1989 
Putnam examination. In fact, the statement was far from sharp, since upper bounds of the 
form Km~* can be obtained. Larry Crone and Richard Holzsager observed that it is not 
necessary to restrict to polygons with an odd number of sides, and showed that there are 
vertices Vo and V; of a regular n-gon inscribed in the unit circle such that 


2 
\d(P, Vo) — d(P, V})| < < 


for all P in the unit circle. 


Solved also by L. Crone & R. Holzsager, M. Golomb, O. P. Lossers (The Netherlands), GCHQ Problem Solving 
Group (U. K.), and the proposer. 
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Perimeters of Inscribed Polygons 


10275 [1993, 75]. Proposed by Murray S. Klamkin and A. Liu, University of Alberta, Edmonton, 
Alberta, Canada. 


Let A be a regular n-gon with edge length 2. Denote the consecutive vertices Ag, ..., 
An— and introduce A, as asynonym for Ag. Let B be a regular n-gon inscribed in A with 


vertices Bo, ..., Bn—1 where B; lies on Aj Aj, and |A;B;| = A < 1 forO <i <n. Also 
let C; be the point on AjAj+41 with |A;C;| = a; < 4 for 0 <i < n and let C denote the 
n-gon, also inscribed in A, with vertices Co, ..., Cy—1. 


With P(F) denoting the perimeter of the figure F, prove that P(C) > P(B). 


Solution I by Albert Nijenhuis, University of Pennsylvania (Emeritus), Philadelphia, PA, and 
University of Washington, Seattle, WA. We show below that Area(C) > Area(B). B and C 
are both n-gons, and B has minimal perimeter for its area, while C may or may not have this 
property. Since minimal perimeter for n-gons is an increasing function of area, it follows 
that P(C) > P(B). 


Lemma. Area(C) > Area(B). 

Proof. We replace © by a sequence of at most n n-gons, each of them obtained from 
the previous by replacing a vertex C; by B;; the new vertex is then denoted C;. Such a 
replacement is area-reducing. Indeed the half lines A; Aj; and B;-; Bj; intersect; as a 


consequence A; Aj+1 and Cj_1Cj+1 intersect. It follows that B; is closer to Cj_1Cj+ 1 than 
C; is, so Area(AC;_-1Cj;Cj41) => Area(ACj_1 Bj; Cj+1). 


Solution IT by Roy Barbara, Lebanese University, Fanar, Lebanon. First we formulate a 
method for comparing lengths. 


Lemma. Let ABCD be a convex broken line. Assume AB = CD and that the angles at B 
and C are equal. Denote by I, J and K the midpoints of AB, BD and CD, respectively. Let 
R be between A and 1, T between B and J, and U betweenC and K. Let S also lie on BT. 
Then RS + SU > RT+TU. 


Proof. 


Let V be the reflection of U across BC. Denote by O the intersection of RV and BC. 
Using similar triangles, it is clear that O is between J and C. Thus, T is inside the triangle 
RSV. Therefore: RS + SU =RS+SV >RT4+7V=RT+TU. 

Now we apply the lemma to solve the problem. Consider the n-gon C: a first application of 
the lemma to Ay,_1AQA1A2(R, S, T, U being Cy_1, Co, Bo, Ci resp.) means that replacing 
the vertex Co by Bo will decrease the perimeter of ©. More generally, if we denote by F; 
the n-gon with vertices Bo,..., Bj, Ci41,..., Cn-1(O < 1 < n —1), by repeated use of the 
lemma, we obtain P(C) > P(Fo) = P(F1) =--- = P (Fn-1). Since F,~1 is B, the proof 
is complete. 
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Note that we proved a more general result: the n-gon B need not be regular; it is only 
necessary that |A; B;| < S\|AiAistl. 


Solved also by J. Fukuta (Japan), O. P. Lossers (The Netherlands), H. M. Marston, A. D. Melas (Greece), R. M. 
Robinson, A. A. Tarabay (Lebanon), A. Tissier (France), J. C. Vera Lizcano (student, Colombia), A. N. ’t Woord (The 
Netherlands), and the proposer. 


Collaborating editors: David F Appleyard, Paul T. Bateman, Bruce C. Berndt, Duane 
M. Broline, Barry W. Brunson, Frank S. Cater, Gulbank D. Chakerian, Underwood Dudley, 
Gerald A. Edgar, Michael A. Filaseta, Ira M. Gessel, Richard A. Gibbs, Jerrold R. Griggs, 
Douglas A. Hensley, John R. Isbell, Mourad E. H. Ismail, Murray Klamkin, Daniel J. Kleitman, 
Frederick W. Luttmann, Frank B. Miles, Richard Pfiefer, Stephen L. Portnoy, J. O. Shallit, John 
Henry Steelman, Kenneth B. Stolarsky, David E. Tepper, Douglas B. Tyler, Daniel Ullman, and 
William E. Watkins. 
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Carl Runge on Algorithmics 


Suppose the mathematician gives them a method of 
calculation perfectly logical and conclusive but taking 
200 years of incessant numerical work to complete. 
They would be justified in thinking that this is not much 
better than no method at all. So there arises a third 
stage of the solution of a mathematical problem in 
which the object is to develop methods for finding the 
result with as little trouble as possible. I maintain that 
this third stage is just as much a chapter of mathematics 
as the first two stages, and it will not do to leave it to the 
astronomer, to the physicist, to the engineer or whoever 
applies mathematical methods, for this reason that 
these men are bent on the results and therefore they 
will be apt to overlook the full generality of the methods 
they happen to hit on, while in the hands of the mathe- 
matician the methods would be developed from a higher 
standpoint and their bearing on other problems in other 
scientific inquiries would be more likely to receive the 
proper attention. 


From Graphical Methods by Carl Runge, 
Columbia University Press, 1912. 


Submitted by Steve Maurer 
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Calculus in the Operating Room 
Pearl Toy, M.D., and Stan Wagon 


Here is a realistic, and potentially important, application of a familiar topic from calculus. 
Imagine a hospital patient about to undergo surgery. Suppose he has 5 liters (L) of blood 
in his body, 40% of which consists of red blood cells (this percentage is called the 
hematocrit), and, during the surgery, he will bleed 2% liters of blood. This is a realistic 
estimate for certain types of hip replacement, for example. His blood volume is maintained 
at 5 L by controlled injection of saline solution (no blood cells), which we assume to mix 
instantaneously with his blood. This means that the blood lost through bleeding becomes 
less rich in red cells as the operation progresses. 


Question 1. What is the patient's volume of red blood cells at the end of 
the operation? 


Some of the lost blood can be recovered, washed, and returned to the patient after 
the operation; but there is some loss and washing is an expensive procedure. Suppose that, 
before the operation, some blood is removed from the patient and replaced with saline 
solution. This blood will be returned to the patient afterward. This procedure, called acute 
normovolemic hemodilution (ANH), will decrease the loss of red blood cells during the 
operation. However, during the transfusion the patient's total blood volume is maintained 
at 5 L; as with the bleeding during surgery, this affects the rate of red blood cell removal. 


Question 2. If it is known that the patient's hematocrit can go as low as 
20%, but no lower, how much blood should be replaced in the ANH 
procedure just described? 


Both of these questions can be answered by a simple exponential decay model, once 
one makes the observation that the rate of blood cell loss during the operation is proportional 
to the amount of red cells present. For the numbers given, and with time measured as a 
fraction of the length of the operation, the proportionality constant is 2, since 2.5 of the 5 L 
are lost. 

Now both questions can be answered. If f(t) is the volume of red blood cells 
remaining at time t, then f(t) = f(0)e*. For Question 1, f(1) = 2000Ve = 1213 milliliters. 

For Question 2 we need to know what value of f(0) will cause f(0)/Ve to be 1000 ml 
(20% of 5 L); this is (0) = 1000Ve. In order to figure out how much should be removed by 
ANH, we reverse the technique just discussed and solve 2000e*°™ = 1000Ve to get 
k = 966 ml. This leaves 1000Ve, or 1649 ml of red blood cells in the patient for a hematocrit 
of 33%, which will become 20% after surgery. 

Note that without the transfusion the red blood cell loss is 2000-1213 = 787 ml. 
With the transfusion the patient starts with 1649 ml of red blood cells and ends up with 
1000; a loss of 649 ml. There is a net savings of 138 ml of red blood cells. This savings 
may in fact not be large enough to justify the procedure; it must be balanced with overall 
expense, risks associated with ANH, and the risk of an adverse reaction to blood the patient 
may have to receive from a blood bank after the operation (see M. E. Brecher and 
M. Rosenfeld, Mathematical and computer modeling of acute normovolemic hemodilution, 
Transfusion 1994 (34), 176-179). But it is noteworthy that a simple freshman-calculus model 
applies to the basic situation. 


Moffitt-Long Hospital Department of Mathematics 
University of California Macalester College 
San Francisco, CA St. Paul, MN 
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Experimentation and Conjecture 
Are Not Enough 


Deborah Tepper Haimo 


Dedicated to the Memory of Franklin Tepper Haimo 


This is an exciting time in mathematics. Its various special areas are coming 
together to emphasize the discipline’s unity. In addition, there is general growing 
recognition that good teaching cannot be separated from research, and that we 
must be more successful in communicating the tenets of our field to the broader 
community. 

In underscoring the beauty of mathematics as well as its relevance, it is of 
utmost importance that we educate those naturally gifted and interested, as well as 
others who may have latent ability which should not be ignored, and the many who 
need to learn at least the basic concepts and to gain some appreciation and 
understanding of the field’s vitality. 

It is heartening to see that we are beginning to acknowledge our responsibility 
to become involved in raising the understanding of the mathematical knowledge of 
all our citizenry. We need 


e creative investigators at the forefront of research to advance the field through 

major breakthroughs, 

our professionals to apply developed theories to practical applications, 

our instructional sector at every level, from kindergarten teachers to university 

research professors, to educate our youth, 

e our amateurs to generate and maintain public interest in our field, 

¢ and a broad, mathematically literate citizenry to appreciate its importance and 
power and its need for public support. 


It is daunting to contemplate how extensive a challenge we face, and how much 
there is to accomplish. 

We may recall that, not very long ago, mathematics was described, on the one 
hand, as the queen of the sciences, and on the other, as the handmaiden of the 
sciences. It is interesting to note that, in these characterizations, our discipline is 
identified as “feminine”, although there seems to be some question about her 
social status, ranging from one extreme to the other. 

Today, mathematics is generally viewed on a more neutral level with the 
sciences, neither in a lofty regal position nor in a lowly subservient one. It is 
important, however, that it continue to maintain its unique quality. 

As we enter an era of great change, we need to recognize and stress the distinct 
nature of mathematics that differentiates it from all other sciences. In particular, 
as we return, in our educational approach, to the historic emphasis on problem- 
solving, along with our current focus on experimentation and conjecture, we must 
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be careful to appreciate the fact that, laudable as this approach 1s, it is a start, but 
not enough! 

While the introduction of problem-solving is intended to involve students more 
actively in their learning and understanding of the nature of mathematics,—and 
that is certainly a positive factor,— there is another vital aspect, the need for 
proof, unique to mathematics, and yet all too often, ignored in our educational 
process. Problem-solving is not complete until the results have been firmly estab- 
lished. Proofs are an integral part of mathematics and must not be overlooked! 

It is important that, in our efforts to make mathematics more relevant, attrac- 
tive, and accessible to a greater number of students, we avoid discarding its 
fascinating and surprising features. Since these features must have attracted most 
of us, it seems reasonable to believe that they would appeal also to a segment of 
our students. 

We cannot afford to neglect the education of those students who are drawn to 
what generally are thought to be the more challenging aspects of mathematics; we 
cannot assume that they will manage well without our help and direction. As one 
of them once plaintively exclaimed “just because some of us are considered able 
doesn’t mean that we don’t need guidance; we flounder and become discouraged 
too!” 

In order that we have better success in interesting most students, problem- 
solving with experimentation and conjecture, may be a reasonable approach. At 
the same time, we must also expose our students to proofs, and must strive for a 
reasonable balance. We must be careful not to distort the subject by removing its 
major distinguishing component and eliminating the essence of its uniqueness. 
Whereas in some areas, substantiation of a theory for a large number of instances 
is considered adequate for acceptance, not so in mathematics. 

I remember that when I was a mathematics undergraduate many years ago, we 
used to consider physics majors as coming to conclusions that were invariably 
sloppy. Our favorite description of physicists, as distinguishable from mathemati- 
cians, was that they believe that all odd numbers are prime, and that they reach 
that result by the argument that 3 is a prime, 5 is a prime, 7 is a prime, 9 is an 
experimental error, 11 is a prime,...,  q.e.d. 

Of course we exaggerated, but we had a point that always applies to mathemat- 
ics. In mathematics, there is no middle ground; it is “all or nothing’! Looking at a 
number of examples and recognizing relationships among elements may lead to 
interesting and useful conjectures, but they remain no more than that—or, to 
quote a familiar refrain, ‘‘tain’t necessarily so”. A conjecture becomes accepted as 
fact only after it has been confirmed by an acceptable proof, or else a counterex- 
ample must be found to reject it. 

In the current reform movement in mathematics education, there is recognition 
of the importance of stressing that experimentation and conjecture lie at the very 
core of the field. There is also mention of the need for justification in some form, 
at appropriate levels, before a presumed conclusion can be accepted. Unfortu- 
nately, particularly in introductory courses, the problem-solving aspects are em- 
phasized greatly while proofs of results are generally barely mentioned or ignored 
entirely. 

The stress on problem-solving, where students are encouraged to look for 
patterns and draw conclusions, merits our applause. That is the nature of mathe- 
matics; that is when experimentation and conjecture occur. It is, however, only a 
beginning, and this must be made unmistakably clear if we are serious about 
educating our students fully. 
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Depending on the educational context, a rigorous proof may not be in order, 
nor may it be necessary initially to present more than an intuitive outline. Both 
teacher and student must be thoroughly aware, however, of the limitations of what 
has been accomplished—no more than the validity of the finite number of special 
cases considered. 

What appears to hold and what seems totally plausible may actually not be so in 
general. The crux of mathematics is lost if that fact is not clearly understood and 
appreciated, and all our students must recognize this. If we are to educate them, 
we need to convey to all the essence of mathematics. It is imperative that we not 
only address the needs of those who seek relevance, but also, maintain the interest 
of those able to understand the abstract nature of mathematics and to appreciate 
its beauty; these students, too, must be encouraged to use their abilities to the 
fullest to continue as leaders to the forefront of our field. 

As our students engage in problem-solving, we must point out that examples 
abound of conjectures that fail to hold beyond some point. Let us recall, for 
example, Euclid’s indirect proof, as generally given in an introductory course in 
algebra, that there are an infinite number of primes. That proof might suggest that 
a way to construct primes is to multiply all those up to a certain value and then to 
add 1. We thus would start with the first prime 2 to get 


2+1=3; 
and would continue, getting the following successive results 
2°3+1=7; 


2°3:54+1=31; 
2°3°5°74+1= 211; 
2°3°5°7:-114+1 = 2,311. 


Since 3, 7, 31, 211, and 2311 are all primes, a conjecture that this algorithm will 
generate only primes would seem reasonable at this point. Unfortunately, we next 
have 
2°3°-5:-7-11-134+ 1 = 30,031, 
and 
30,031 = 59 - 509, 
so that conjecture fails on the sixth attempt. 

In mathematical work, experimentation often with specific concrete examples to 
test a conjecture, is the norm, though this is generally hidden from sight. As one 
faculty member once remarked, ‘“‘mathematicians’ experiments end in the waste 
basket!”. Divulged are not the false starts and the difficulties encountered, nor the 
messy computations, but only the carefully executed final results. Such omissions, 
sometimes without even any public indication of the insight and motivation that 
led to a particular conclusion, is regrettable, and helps to create an aura of an 
incomprehensible subject. 

Students, for the most part, engage in solving problems whose solutions are 
known, even if not to them. It is thus of utmost importance that teachers, 
especially, be in a position to provide useful guidance. They must be knowledge- 
able of the existence of a proof or a counterexample for a given problem; 
otherwise, they may not realize that their students’ experiments may not have been 
carried out far enough to determine whether there is a valid basis for an algorithm 
that holds in general for that problem. 

We are able to disprove our first example rather quickly, but not all conjectures 
can be so readily discarded. That is why it is essential to establish general validity. 


104 EXPERIMENTATION AND CONJECTURE ARE NOT ENOUGH [February 


Let us turn, for example, to a problem proposed by George Polya, acknowledged 
as one of the most renowned of problem-solvers. 
In 1919, Polya made an interesting observation. He let 


n represent any positive integer, 

r, the number of prime factors of n, counting multiplicities, and taking r = 0 
when n = 1, and r = 1 when n is a prime, 

O,,, (x an integer), the number of positive integers <x with an odd number 
of prime factors, 

E,,, the number of positive integers <x with an even number of prime 
factors. 


Polya conjectured that, for x > 2, 


O.. = E,. 
Further, if 
L(x) =E,—0O,, 
then 
L(x) <0. 


The function, L, can be written as a sum 
x 
L(x) = Ye A(n), x> 1, 
n=1 


where A(7) is the Liouville function 


A(n) =(-1)’. 

The Polya conjecture is not an arbitrary exercise, but an attempt to relate the 
Liouville function to the distribution of prime factors. Its validity for the first 50 
consecutive positive integers is readily established by hand, and many, many 
further cases confirm this observation. Indeed, the conjecture was generally 
deemed true for nearly 40 years, until 1958, when C. B. Haselgrove proved that 
L(x) > 0 for infinitely many x. He failed, however, to produce any specific x for 
which the conjecture fails. 

In 1962, R. S. Lehman found that 


L (906, 180, 359) = 1, 


and in 1980, M. Tanaka discovered that the smallest counterexample of the Polya 
conjecture occurs when 

x = 906, 150, 257. 
Thus a very promising conjecture fell by the wayside! Note, however, that this did 
not occur for the first 906, 150,256 integers! Indeed, although a counterexample 
may not appear before the trillionth case, or even much later, it is enough for 
disqualification. 

Mathematics provides for other disciplines the compactness and simplicity of 
language that allows useful descriptions of fundamental results and natural phe- 
nomena. Indeed, to most of the outside world, the importance of mathematics lies 
in its utilitarian role. 

Those outside the discipline rarely appreciate the subject’s intrinsic beauty nor 
marvel at its great power. That power, for such broad and diverse applications, is 
derived from its abstract nature, the very characteristic that instills awe, but 
alienates many of those merely interested in the concrete and the applicable. 
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While the need to make the subject more relevant may be inescapable, it should 
not be so all encompassing that the essence of mathematics as a major discipline in 
its own right is totally lost. It is a field with problems that attract and fascinate 
some with no interest in the applied or the relevant. 

Many of those who have worked in such areas as number theory had no intent 
to solve practical problems. Yet, as we know, their results have sometimes 
produced incredible applications many years later. We must thus make sure that 
we do not fail to captivate and nurture students with such a bent who may later 
contribute to the advancement of the discipline itself. These will be the researchers 
of tomorrow who will continue to develop the theoretical basis which sometimes 
leads to unexpected and important applications that benefit all of us. 

There are many examples in number theory of conjectures that fail to hold. 
These generally have a universal appeal, since they involve positive integers, and 
are readily understood. As is well known to mathematicians, many are extremely 
difficult to prove conclusively. Indeed, attempts to prove some seemingly simple 
problems, using known techniques, are often fruitless. In some cases, however, 
these concerted efforts have led to the creation of new methods and the introduc- 
tion of new fields of far greater significance and impact, both to mathematics and 
its applications, than the original rather specialized quest might have indicated. 

A famous example, of course, is Fermat’s last theorem. When, after some 350 
years, a proof was recently announced by Andrew Wiles of Princeton University, 
the imagination of the general public was aroused to such an extent that a popular 
presentation, arranged at the San Francisco Exploratorium, sold out quickly and 
even had scalpers charging many times the regular cost of a ticket! 

Simply stated, the Fermat conjecture generalizes the well known Pythagorean 
theorem by asserting that there are no positive integers x, y, z, and n > 2, such 
that 


x” +y" =z", 


As is well known, Fermat claimed to have found what he described as a “‘marve- 
lous” proof of his conjecture that there exist no non-trivial solutions to the 
problem. He explained that he omitted the proof due to his inability to fit it into 
the margin of his book. There are serious doubts that his proof was valid. Indeed, 
it would take more than Fermat’s book margin just to list the names of all those 
who have made significant contributions in the course of trying to prove the 
conjecture. 

In his attempts to settle the question, Wiles himself appealed to results of 
elliptic curves, Galois representations, and modular forms in order to make 
substantial headway. His announced version still is not entirely complete. He 
expects, however, that in the near future, he will have the needed final computa- 
tion of a precise upper bound for the Selmer group in the semistable case. 

Some of the techniques already developed in earlier years have proved useful in 
dealing with related problems. For example, after establishing, in 1769, that the 
special Fermat equation 


e+y3 = 73, 


has no non-trivial integer solutions, Euler made a more general conjecture surmis- 
ing that the equation 

Xp PX + +X 4 n 
has no positive integer solutions x,,X,...,X 


n° 
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For n = 5, a counterexample was produced by a direct computer search in 1966 
by L. J. Lander and T. R. Parkin. They found that 


27° + 84° + 110° + 133° = 144°. 


It is interesting to note that, when n = 4, the conjecture could neither be proved 
nor could a counterexample be found by a similar direct computer search. In 1987, 
however, Noam Elkies of Harvard University introduced a pencil of curves of 
genus 1 which lies on 


at +b4+ct=d", 
a,b,c,d integers, and found the simplest curve in the pencil which could possibly 


lead to a rational point that would disprove the Euler conjecture. He then used the 
computer and succeeded in establishing that 


2,682,440* + 15,365,639* + 18,796,760* = 20,615,673°. 


In disproving the Euler conjecture for n = 4, he not only established the result, 
but demonstrated the importance of mathematical analysis to enable him to take 
advantage of the computer. It was only after restricting the variables to lie on an 
appropriate curve that the solution was found, being beyond the range of earlier 
exhaustive searches. 

With this solution revealed, Elkies was then able to exploit the theory of elliptic 
curves to generate recursively arbitrarily many other solutions from that one. In a 
subsequent computer search, a minimal solution of the equation was found by 
Roger Frye of Thinking Machines to be 


95 ,800* + 217,5197 + 414,560* = 422,481%. 


Aside from number theory, elementary problems in other areas may also lead to 
erroneous conclusions if experimentation is terminated prematurely and no proof 
ensues. As an example, consider n points on a circle, connected pairwise by 
chords, no three of which are concurrent within the circle. The problem is to 
determine the number of distinct, non-overlapping regions, R,,, formed. 

We start by counting, and note that 


R,=1 
R, = 
R, = 
R, = 
R; = 16 


strongly suggesting the reasonable general algorithm 
R,=2" ',n2>1. 
To check this result, we turn to R, and find 
R, = 31. 


Our careful count turned up no more than 31 regions, one short of the number 
predicted by our conjecture. 

Fortunately, we can analyze the situation mathematically, and note that we can 
invoke the familiar Euler formula 


F=1-V+E. 
Here we have the number of faces of a planar map in terms of the number of 
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vertices and edges that determine those faces. Now, 


the faces F are the R,, regions; 

the vertices V are the n points on the circle and the interior points i 
resulting from intersections of chords; 

the edges E are our n circular arcs and the chord segments formed by the 
interior points. 


Substituting into the Euler formula, and using an identity involving the binomial 
coefficients, we have 


R, = C(n — 1,0) + C(n — 1,1) + C(n — 1,2) + C(n — 1,3) + C(n — 1,4), 
the sum of the first 5 binomial coefficients in the expansion of 
arta (1t1j)"' 
= C(n — 1,0) + C(n — 1,1) + C(n — 1,2) + +++ +C(n —-1,n - 1). 


It is now clear that the two agree only when n < 5. 
Pictorially, we have the Pascal triangle, with a section deleted after the fifth row 
of numbers, as shown below: 


A less tractable problem in this general area is the four-color map problem that 
has challenged mathematicians for many years. Indeed, G. D. Birkhoff once noted 
that every serious mathematician that he knew, had, at one time or other, tried to 
prove the four-color map problem. This problem claimed that, to differentiate 
among the countries of any map for which no contiguous regions meet in just one 
point, four is the minimum number of colors needed. 

First proposed in 1852 by Francis Guthrie, it gained substantial importance and 
attracted wide attention when a proof, published in 1879, was found to have a 
serious gap eleven years later. Many of the famous mathematicians who devoted 
much time and effort to seeking a solution, created new fields in the process, as is 
not uncommon. 

Substantial interest in the problem continued throughout the years. Indeed, 
during my student days, it was one of the foremost problems discussed. I remem- 
ber, particularly, that when Hassler Whitney was asked one day about how long a 
dissertation had to be, he responded that a two-line proof of the four color 
problem would suffice. 

Many years later, in 1976, Kenneth Appel and Wolfgang Haken produced a 
proof, but it not only was far longer than a mere two lines, its techniques were 
totally unorthodox. After reducing the problem to consideration of the characteris- 
tics of some 2000 different maps, they programmed a computer to determine the 
outcome. The use of a computer to solve a mathematical problem remains 
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unacceptable to some, and it was expected that an analytic proof would soon be 
found. Now, nearly 20 years later, this had not occurred. 

In recent years, as our technology has developed, the introduction of computers 
has extended far beyond mere computer-assisted proofs to the point that there are 
advocates of using computers to provide probabalistic proofs as well as visual video 
proofs, thus changing the very nature of mathematics which has always relied on 
rigorous analytic proofs. 

While we must prepare our students to recognize the importance of computers, 
we must instruct them in their appropriate use. Using computers to suggest 
patterns or inspire intuition replaces old tool with new, but does not change the 
character of proof. To go further, however, and accept as valid results that have 
been shown to hold for some finite number of cases, however large, because it has 
not been possible to establish them in general is to engage in a different area. 

The high standards of rigor, introduced after informal reasoning created serious 
problems in the 18th century, have served us well, propelling our field forward at 
an impressive rate. We should not reverse that direction. 

One of the intriguing characteristics of mathematics is the existence of proper- 
ties that are totally counter to intuition. Some results can be convincingly proved, 
yet they confound the imagination as they seem to contradict what reason would 
seem to dictate. 

Such a situation is encountered early in a calculus course. It occurs when the 
curve, described by y = 1/x, is rotated about the x-axis for x => 1. The resulting 
surface of revolution, known as Gabriel’s horn, has the surprising property that we 
can fill the horn with a finite amount of paint, but no matter how many times we 
might refill it, we never would have enough to paint its surface, a remarkable 
example of a body of finite volume and infinite surface area! 

Mathematics demands a high level of precision in the concepts we introduce, as 
their validity must be upheld when they contradict our intuitive notions. This was 
certainly the case in 1890 when the mathematical community was electrified by 
G. Peano’s presentation of a space-filling curve. He established the existence of a 
continuous map f of the closed unit interval onto the closed unit square. One 
illustration is the following construction of a sequence of continuous functions f, 
which converge to the limiting function f sought. 

We start out by bending of the unit interval into a right isosceles triangular 
segment, (i) below. 


Every such right isosceles triangular segment is further mapped into a chain 
of 4 such segments, each of 1/2 the length, with the chain beginning at the 
same initial point and ending at the same final point, as in (ii). 

Repeating the operation on each of the four right isosceles triangular 
segments, we have the next member of our sequence as shown in (iii). 


(i) (ii) (iii) 
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We continue to repeat the operation recursively noting that our path is made 
up of 4” right isosceles triangular segments each lying in a square of side 
1/2”. Each of these paths represents a continuous function f,,, and covers 
more of the initial square. 


It is then not difficult to establish that the sequence f,, converges to a 
continuous surjective function f. 

A computer can be used here for great effect to give a visual demonstration of 
the recursive procedure described. It will provide a clear picture to show the 
square being covered by the continuous curve progressively more and more until it 
appears completely covered. 

Strange as space-filling curves are, an even more counter-intuitive result was 
established by Felix Hausdorff in 1914 when he sought to determine whether there 
exists a non-trivial, finitely additive, congruent-invariant measure. He succeeded in 
establishing the non-existence of such a measure in three dimensional space when 
he proved that, neglecting a denumerable set, the surface of a sphere can be 
decomposed into a finite number of pieces and reassembled by rigid motions to 
form two spheres, each with the same radius as the original one. 

A decade later, Stefan Banach and Alfred Tarski proved the same property for 
a solid sphere, without the need to remove a denumerable set. They went even 
further, obtaining the more general result that, in three dimensional space, given 
any two bounded sets, with interior points, one of the sets can be decomposed into 
a finite number of disjoint subsets which can be reassembled to form the other set, 
with no gaps or overlaps. In short, the two sets are equidecomposable. 

Banach and Tarski succeeded in showing the fascinating and nonintuitive result 
that, say, the earth can be decomposed into a finite number of pieces which can be 
reassembled to form a marble, or even to form two earths each of the same size as 
the original. John von Neumann added to these amazing facts the observation that 
only nine pieces are needed for the decomposition of one sphere into two, all with 
the same radii. Abraham Robinson went further yet in 1947 showing that five 
pieces will suffice! 

In the plane, the Banach-Tarski Paradox fails to hold. We have, rather, that any 
two well-behaved planar sets with the same area are equidecomposable. Tarski 
conjectured that a closed disk can be cut up into a finite number of pieces which 
can be reassembled to form a square exactly, with no gaps or overlaps. This was 
proved in 1989 by the Hungarian mathematician Miklos Laczkovich. 

Rather than the small number of pieces into which one sphere in space may be 
cut in order to be reassembled into two, Laczkovich found that, in the plane, to 
effect the decomposition of the closed disk to form a square, it was necessary to 
have some 10°° pieces of a great variety of strange shapes. Remarkably, these 
pieces merely had to be translated to new positions to form a square exactly. 

The talent and persistence of mathematicians, using ever growing and more 
powerful mathematical tools, will ultimately succeed in resolving the questions 
posed in some of the outstanding conjectures. 

Left over from earlier centuries and one of the remaining major unsolved 
mathematics problems, the Riemann Hypothesis, has a relationship to analysis that 
corresponds to that of Fermat’s Last Theorem to arithmetic. If proved true, it 
would have significant consequences in number theory, further unifying seemingly 
independent areas of mathematics. The Hypothesis is particularly difficult to 
establish since there seems no rationale to indicate why it might be true. 
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For a complex variable s, Riemann defines his zeta function, ¢, by the series 
f(s) = di 1/n’. 
n=1 


The Liouville function A, which we encountered when we considered the Polya 
conjecture, is connected to Riemann’s zeta function by the equation 


£(2s)/t(s) = An) /n*. 
n=1 


Riemann established that all zeros of f(s) in the right half plane lie in the unit 
vertical strip 0 < Res < 1. He then stated his famous conjecture that all non- 
negative zeros of £(s) lie on the line Res = 1/2, an incredible conclusion for 
which there seems to be no discernible reason. 

In 1914, G. H. Hardy proved that there are infinitely many zeros on the line 
Res = 1/2. Confirmation of the conjecture for a finite number of cases has been 
steadily increasing. In 1955, computers were used to substantiate the fact that 
Riemann’s conjecture holds for the first 25,000 zeros of the zeta function, all 
simple and all on the line Re s = 1/2. In 1969, it was found that this continues to 
be true for the first 3,500,000 zeros of the zeta function; i.e. they are all simple and 
lie on the line Res = 1/2. 

These examples, finite in number as they are, provide increasingly convincing 
substantiation of the validity of the Riemann hypothesis. Indeed, the use of high 
speed computers has confirmed the conjecture for the first 1.5 billion zeros. 
Nonetheless, problems may arise as soon as computation lies beyond the range of 
any existing computer. That is why mathematics requires solid proofs and why the 
new technology can be so helpful, if used properly, either in providing counterex- 
amples or in aiding in establishing a conclusive positive proof of a result. 

Solutions of deep and difficult problems lead to the persistant public impression 
that to work in mathematics, at whatever level, requires superhuman talent, and 
that those so endowed can understand the incomprehensible and arrive at seem- 
ingly miraculous conclusions without great effort. 

As we all know, some may be more perceptive and able to see deeper results, 
but no one, at any stage, who does mathematics is spared much thought and 
greater effort. Experimentation and conjecture open the way and allow all to 
participate in, and benefit from, the adventure. 

We need to extend public awareness to the realization that mathematics is more 
than arithmetic calculations, algebraic manipulations, and Euclidean geometric 
proofs, and that studying the calculus is not the ultimate attainment. We must 
counter the general perception of mathematics as a static subject, and the image of 
mathematicians as technicians who can solve any problem. We need to raise 
awareness of the vitality of mathematics as a field with a myriad of unsolved 
problems and numerous diverse and unchartered areas to be conquered by 
imaginative and creative scholars! 

By adopting a problem-solving approach, we have a means of providing students 
with a significant mathematical experience and deeper understanding of the nature 
of the discipline. By introducing appropriate use of the new technology, with 
recognition of its limitations, we will be able to widen and enhance our students’ 
mathematical range. By emphasizing the importance of realizing that more is 
required in mathematics than mere experimentation and conjecture, and by 
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expecting the ablest students to prove assertions and validate them, we can 
educate all to the full extent of their abilities. This will not only enrich their lives, 
but prepare them to take their place in, and contribute to, modern society, and 
assure the continuation of the discipline as a major force in the world. 
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Pebbling a Chessboard 


Fan Chung, Ron Graham, John Morrison, 
and Andrew Odlyzko 


1. INTRODUCTION. The following puzzle has attracted some attention recently. 
We first learned of it through Martin Gardner [6]. A version of it appeared in 
Omni magazine in 1993 [11]. However, it was proposed over 10 years ago by 
Kontsevich [9], and a partial analysis of it was published shortly thereafter by 
Khodulev [8]. We begin with an infinite “chessboard” B covering the first quad- 
rant. The cells of the board are labelled by integer coordinates (i, j) with i, j > 0. 
Initially, a single ‘‘pebble” is located in cell (0, 0) (the lower left corner; see Figure 
1). The first step or “move” consists of replacing this pebble by two pebbles, 
located at cells (1,0) and (0,1), respectively. In general, a move will consist of 
removing some pebble, say in cell (i, j), and placing two pebbles on the board, in 
positions (i + 1,7) and Gi, j + 1), provided each of these positions is not already 
occupied. 


Figure 1. The starting configuration on the board B. 


After k steps the board will have k + 1 pebbles on it. We call such configura- 
tions of pebbles reachable configurations. We will denote by R(k) the set of 
reachable configurations with k pebbles, and we set R := U,,,R(k). In Figure 2, 
we show the eight possible reachable configurations with at most four pebbles. 

A little experimentation convinces one that in any reachable configuration, 
some pebble must occupy a cell having coordinates (i, j) with i + j < 3. This fact 
first seems to have been noted by M. Kontsevich [9]. We give the “book” proof of 
this in the next section. If L(k) denotes the set (or “level’’) {(i, j): i + j = k} then 
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Figure 2. Reachable configurations with at most four pebbles. 


we can express the above assertion by saying that L(1) U LQ) U LG) is unavoid- 
able, i.e., any reachable configuration must always have some pebble in a cell in 
L(1) U L(Q2) U LG). In general, an unavoidable set is one which intersects every 
reachable configuration. Of course if S$ is unavoidable and 7 DS then T is 
unavoidable. Let us call S a minimal unavoidable set if S is unavoidable but no 
proper subset of S is, and let M(kK) denote the family of minimal unavoidable sets 
with k cells. 

In this note we will characterize the elements of M(k) and give a polynomial 
time algorithm for recognizing such elements. Many of these results were first 
proved by Khodulev [8], and we present them here for completeness, since the 
paper [8] is not widely available and contains only sketches of proofs. We will also 
determine the asymptotic growth rates of r(k) := |R(k)| and m(k) := |M(k)|, the 
sizes of R(k) and M(k), respectively, as k — . (These results are all new.) It 
turns out that the analysis of r(k) and m(k) leads to some interesting problems in 
asymptotic enumeration. 

Further results on this problem, including generalizations to arbitrary partially 
ordered sets, have recently been obtained by Eriksson [4]. 


2. PROPERTIES OF UNAVOIDABLE SETS 
Lemma 1. [9] The set L(1) U L(2) U LG) of all (i, j) with i + j < 3 is unavoidable. 


Proof: To each cell (i, j) assign the weight 2~°*”. Observe that: 


(i) The total weight covered by pebbles in any reachable configuration is 1. 
This is so since the starting cell (0,0) has weight 1, and a move does not 
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change the weight of cells covered, i.e., 
9-4) = 9-GtDH) 4 9-G+G+D), 

(ii) The total weight of all cells in the board is LD, ;, »2~¢*” = 4. 

(iii) The total weight of LG) U L(2) U LG) is 13/4. Thus, the weight of the 
complement of L(3) is only 3/4, and since that is less than 1, cannot 
contain all the pebbles of a reachable configuration. Thus, L(1) U LQ) U 
L@) is unavoidable. | 


However, L(1) U L@) U LQ) is not a minimal unavoidable set. The following 
result was proved by Khodulev [8]. It was independently conjectured by Martin 
Gardner [6]. The proof given here is due to Harold Reiter [14]. 


Lemma 2. L(1) U LQ) is unavoidable. 


Proof: As before, assign the weight 2~“*” to the cell (i, j). Observe now that any 
reachable configuration C has exactly one pebble on each of the boundaries 
{(i, 0): i > 0} and {(0, j): j = O}. Thus, the total weight which C can cover outside 
of L(1) U L(2) is 


2-234 Yo 2) 21, 


i,j>1 

i+j>3 
This implies that if C is to avoid L(1) U L(2), it must cover all these cells, which is 
impossible since C is finite. a 


However, L(1) U L(2) is not minimal either, as we will see later. 

We should observe that for any reachable configuration C, the set of moves 
needed for reaching C is unique. Only the order in which these moves are 
executed can vary in the different ways of reaching C. 

Suppose now that we relax the rules for moves by allowing the replacement of a 
pebble at (i, 7) by pebbles at (i + 1,/) and (i, 7 + 1) even when these positions 
might already be occupied by pebbles. In other words, we allow the accumulation 
of multiple pebbles in cells during the process of reaching C. It might be helpful 
for this model to imagine that the pebbles first move onto the vertices of an 
infinite binary tree rooted at (0,0). Then the 2* vertices in the kth level of the 
tree are identified in the obvious way with the k + 1 cells in the kth level L(k) := 
{(i, j): i + j = k} of the board B. 

An easy induction argument now establishes the following result. 


Lemma 3. If a configuration of pebbles (with at most one pebble per cell) can be 
reached by moves which allow accumulations of pebbles in cells, then in fact it can 
also be reached by the “standard” moves, i.e., those which do not allow accumu- 
lation. 


Given a set X C B, we define the set MCX) of moves recursively as follows. 
Starting at level 0 and proceeding one level at a time by increasing levels, perform 
the moves required either to remove all pebbles from a cell in X, or to remove all 
but at most one of the pebbles from a cell not in X. Continue through the last 
level L(hCX )) containing a cell of X. 


Theorem 1. X C B is unavoidable if and only if after executing the moves in M(X), 
some cell contains at least 3 pebbles. 
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Proof: Let m(i,j) denote the number of pebbles in cell (i,j) after executing 
M(X). 

(i) Suppose that X is avoidable and m(i, j) => 3 for some (i,j). Thus, either 
mG — 1,7 +1) >2o0r mG + 1,7 — 1) = 2. Assume m(i — 1, j + 1) = 2 (the other 
case is similar). Hence, to reach any C € R, we must move at least two pebbles off 
of (i, j), and at least one off of (i — 1, 7 + 1). But this will force (i, j + 1) to have 
at least 3 pebbles, and will force (i + 1, j) to have at least two. Thus, by induction, 
we can never reach an allowable configuration of pebbles (i.e., one in which no cell 
has more than one pebble), which is a contradiction. 

(ii) Suppose m(i, j) < 2 for all G, 7) € B. By the definition of MCX), 


<1 if (i,j) has level < h(X) 
m(i,j)is{ <2 if (i,j) has level h(X) + 1 
=(0 if (i,j) haslevel > h(X) + 1. 


A simple induction argument now shows that the excess pebbles can all be 
(eventually) moved to achieve a reachable configuration in R. Hence YX is 
avoidable. 

This completes the proof of Theorem 1. a 


Note that this result furnishes a polynomial-time algorithm for determining if X 
is a minimal unavoidable set. 


3. RECURRENCES FOR MINIMAL UNAVOIDABLE SETS. Let f(k) denote the 
number of minimal unavoidable sets consisting of k cells. For j > 0, define 
B(j) = U;, ;L@), the set of cells in levels exceeding j. Finally, for t > 0, define 
f(k) to be the number of minimal unavoidable sets with k cells (i.e., of size k) in 


B(t) where we start with the (multiple) pebble distribution of 1, 2,2,...,2,1 in 
L(t + 1), and 0 in all L(s), s > t + 1. As a convention, we take f,(k) = 0 for all 
k < 0. Thus, f(k) = f,)(k — 1) (since (0,0) must be unoccupied), and f(k) = 0 for 
k < 4. We list a set of recurrences which suffice to determine all values of f,(k): 


(i) f,(k) = 2f,(k — 1) + f\(k — 2); 
(ii) f.(k) = fk) + 3f (k — 1) + f,(k — 2) + 48(k, 2) where 
fh ifisy, 

(1,1) = f otherwise; 
Gii) For t = 2, f.(k) = f,_(k) + 2f(k — 1) + fig (k — 2) + 28(k, DAE, 2). 


To see why these are valid, consider (i). In Figure 3(a) we have the starting 
configuration for f,(k). We consider the various possibilities as to whether or not 
various cells in L(1) are in a hypothetical minimal unavoidable set X of size k. If 
(1,0) =X but (0,1) €X then Figure 3(b) applies and X will consist of (1,0) 
together with a minimal unavoidable set of size k — 1 arising from the two pebbles 
at (2,0) and (1, 1). By definition, there are f,(k — 1) of these. The same argument 
applies if (1,0) € X, (0,1) € X (Figure 3(c)). On the other hand, if (0,1) €_X and 
(1,0) € X then Figure 3(d) applies, and f,(k — 2) counts the number of ways of 
completing X. Thus, we have (i). 

The other recurrences (ii) and (iii) are explained in similar ways. In Table 1, we 
list some of the small values of f,(k). 
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Figure 3. 


TABLE 1. Values of f,(k). 
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4. ASYMPTOTICS OF NUMBER OF MINIMAL UNAVOIDABLE SETS. The 
recurrences of the last section are sufficient to determine values of f(k) for small 
k. For large values of k, we can obtain asymptotics of f(k) in a simple form: 
f(k) ~cy*" ask >», 

where y = 4.147899... and c = 0.01676... . More precise estimates (including 
definitions of y and c) are stated at the end of this section. Since y and c are 
algebraic numbers of degree 3, this estimate also shows that there is no simple 
expression for f(k). 

The derivation of the asymptotic expansion of f(k) starts with the recurrences 
of Section 3, and proceeds through two steps. The first step is to derive an explicit 
expression for the generating function of f(k), and the second is to obtain the 
asymptotics of the coefficients of that function. The second step is routine, and is 
sketched only briefly. The first step is the interesting one, since it involves 
complicated-looking functional relations that yield a surprising answer. 

In order to analyze the asymptotic behavior of f(k), it is convenient to 
introduce several auxiliary functions. The definitions are not obvious, and came 
from experimenting with the recurrences to find out which functions give the best 
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results. First, define the function s(-, - ) by 


s(iit+j,j) =f,(), i,j > 0. (1) 
Next, define the generating function 
Sy) = Lisi, j/)y’. (2) 
j=0 


Thus, for example, S;(y) = 4y + 2y*. For i > 3, recurrence (iii) of Section 3 is 
easily seen to be equivalent to the relation 


1+y) 1 
S41 ¥) = SO” 509) ~ y ht 9) + ys(i, 1). (3) 
Finally, set 
S(x,y) = DSi y)x". (4) 


iz3 
Note that we are only interested in 


> fk) xe 
k=5 


cO 


Y fa(k — 1x! = ¥ s(i,0)x' 
k=5 


i=4 


x S,(0)x' = S(x,0). 
i=4 


The additional variable y is brought in only in order to exploit the structure of 
recurrences for the f,(k). From (3) and (i), Gi), Gii) we obtain 


S(x,y) = LSi(y)x! 


iz3 
x(1+y . 
=x*(4y + 2y*) + ca L SiC y)x! 
i>3 
x OS( x,y 
— —S(x,0) tye) (5) 
y d y=0 
Hence 
OS(x,y 
(y —x(1 + y)°)S(x, y) =x?(4y* + 2y*) — xS(x,0) ype . (6) 
y=0 


This is a complicated partial differential equation that at first sight might seem 
intractable. However, it can be solved explicitly. Differentiating (6) with respect to 
y and then setting y = 0, we have 


dS( x,y) 
(1 — 2x)S(x,0) — x —.——— = 0. (7) 
dy y=0 
, dS(x, y) . 
Therefore, we can eliminate ree to obtain 
y= 


(y —x(1 +y)’)S(x, y) = (y?(1 — 2x) — x)S(x,0) + x°(4y? + 2y?). (8) 
On the curve 


y=x(1t+y)’, (9) 
the coefficient of S(x, y) in (8) vanishes and we have 
S(x,0) = x°(4y* + 2y*)/(x — y*(1 — 2x)). (10) 
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Eq. (9) implies that 
y= (1 —2x-(1- 4x)'/*)/(2x) 


for |x| < 1/4, and substituting this into (10) gives an explicit representation of 
S(x, 0) as an algebraic function of x for |x| < 1/4, 
1/2 2 2 3 
(1 — 4x) (1 — 3x +x°) — 1+ 5x —x° — 6x | (11) 
1 — 7x + 14x? — 9x? 

(Through (8) this also gives an explicit representation of S(x, y) for (x, y) in a 
neighborhood of (0,0), but we do not need this, since S(x, 0) is all that is needed 
to derive the asymptotics of f(k).) 

The final part of our analysis is now straightforward. The explicit form of 
S(x, 0) shows that S(x, 0) is analytic in |x| < 1/4 except at zeros of the denomina- 
tor, i.e. at x = 1/y, where y = 4.14789903 ... satisfies 


y? — Ty* +-14y -9 =0. (12) 
Direct substitution into the formula for S(x,0) then shows that S(x,0) actually 
does have a simple pole at x = 1/y, but (in view of the preceding discussion) no 


other singularities in |x| < 1/4. By the standard methods [2, 3, 7, 12], we can 
therefore write 


f(k) =fo(k -— 1) =s(k — 1,0) = [x*7"]S(x,0) =cy*~! + O(4.01*), 


where 


S(x,0) =x 


c= lim S(x,0)(1 — yx) = 0.016762198.... (13) 
x>1/y 


and satisfies (after some messy but routine computation best done with a symbolic 
algebra system) 


7533c3 + 10726c? + 5068c — 88 = 0. (14) 


5. THE NUMBER OF PEBBLE CONFIGURATIONS. In this section we will treat 
the problem of enumerating the number of distinct reachable configurations with 
k pebbles. We denote this number by g(k). As was true for the asymptotics of 
f(k), it is the derivation of an explicit generating function for the g(k) that 
presents the main challenge here. 

As before, let us define g,(k) to be the number of k-pebble reachable configu- 


rations where we start with the initial pebble distribution of 1, 2,2,...,2,1 in 
L,+, and 0 in all L,, s>t+41 (and we restrict ourselves to cells just in 
B(t) = Uys44,L,). Thus, g(k) = g)(k) for k = 2. Arguing along the same lines 
as before, it is not hard to derive the following recurrences for the g,(k).: 
Gi’) gk) = 2g8,(k — 1) + g(k) + 8k, 2); 
(ii’) gk) = 2k — 3) + 2g(k —2)+ gk —1)+ glk — 4); 
iii’) For t > 2, 


8:(K) = 8,-(k —t— 2) + 28(k —t— 1) + 8:4(k — £). 


Now set 
h,(k) = 9,(k +1), 
H(x) = Yh k)x*, (15) 
k=0 
H(x,y) = 1 A(x)y’. 
i=0 
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TABLE 2. Values of h,(k). 


Some values of h,(k) are shown in Table 2. Straightforward computation using 
(15) and (i’), Gi’), Gii’) shows 


1 1 
H(x,y) =x? + = +2x+x*y|H(x, xy) — —H(x,0) +x*yH,(x). (16) 
y y 


dH(x,y) 
Since H,(x) = —————| _, we have 
dy y=0 
OH(x,y 
yH(x,y) =x?y + (1 + xy)’H(x, xy) — H(x,0) +t . (17) 
y=0 


Differentiating (17) with respect to y, and setting y = 0 implies 


; dH(x,y) 
H(x,0) =x + 4——— — + 2xH(x,0). (18) 
y=0 
Substituting 
0H( x,y 
ey) = (1 — 2x)H(x,0) - x’ 
dy y=0 


into (17) gives 
yH(x,y) = (1+-xy)’H(x, xy) + (x?y? — 2x*y? — 1)H(x,0) +x7y — x°y? 
(19) 


which is the basic relation for H(x, y) we will use. This is more difficult to analyze 
than the corresponding functional equation (8) for S(x, y) but we still can obtain 
significant information about its asymptotic behavior. 

To begin, from (15) and (19) we have 


(1 — 2x)H,(x) =xH,(x) +x? 
H,(x) =x*°(Hy(x) + 2H,(x) + Hy(x)) +x°H,(x) (20) 
and for n > 2, 
H, (x) =x"*"(Hy4i(*) + 2H,(*) + H,-1(*)). 


Therefore, 
1 — 2x 
A(x) = [<= (x) — *; 
1 —x?* 
H,(x) = | y2 Ji) — 2H,(x) — H(x) 
= (1 = 28? ~ x*)(1 ~ 28) = 3) Hg(x) ~ 2°(1 ~ 2x? = x4), 


120 PEBBLING A CHESSBOARD [February 


and for n > 3, 
1 
H,,(x) — sa (CA — 2x”) H,_ (x) — x"H,,_(x)). 


It then follows by induction that 


n+1 


H,(x) = x7 ("2')(q,( x) Hol x) - x?p,(x)) (21) 
where 
q(x) =1- 2x, p(x) = 1, 
q)(x) = (1 — 2x* —x*)(1 - 2x) -x3, p(x) =1-—2x*-x* 
and for n > 3, 
On(*) = (1 — 2x") Gq_s(%) — ¥°"'G, 2(X); 
D(X) = (1 — 2x") py_(¥) — ¥°"'Py_a(*) 


(where we can consider (21) as a formal power series identity). From (22) we see 
that 


(22) 


q(x) = lim q,(x); 
p(x) = lim p,(x) 


exist as formal power series and that 


[x“]a(x) =[x*]a,(~), — [x*] p(x) = [x*]p,(*) 
for n >k +1. Note that by (21), increasing powers of x divide qi x)H(x) — 
x*p,(x) as n > », Thus, we have 


x*p(x) 


q(x) 


Hy(x) = (23) 
as a formal power series. 


From (22) and (23) it now follows (cf. [5]) that H,(x) can be written as the 
continued fraction 


Hy(x) = 
1-—-2x- 


(24) 
1—-2x-—x*- 
1-—2x3- 
1—2x*- 
1—-2x> - 


Although this continued fraction is similar to some studied by Ramanujan (see [1], 
[15]), it does not seem to have appeared in the literature before. 

The recurrences (22) imply that p(x) and q(x) are analytic in the disc {x: 
|x| < 1}, and so H,(x) is meromorphic for |x| < 1. To determine the asymptotic 
behavior of H,(x), we need to look at the zeros of q(x). It turns out that in the 
disc {x: |x| < 1/2}, q(x) has only a simple zero at B, =1/a where a= 
2.321642199494 ... . This implies 


ho(k) = [x*]Ho(x) = cake + O(2") (25) 
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where 
_ —B,P(B;) 
a Q(B) 
In fact, g(x) has zeros of multiplicity one at 
B, = 0.430729593... 
B. = 0.685754744... 
B3 = —0.704352541... 
B, = 0.782572917... 


and no other zeros in {x: |x| < 0.8}. A more careful analysis shows that (25) can be 
improved to 


4 
P(B;) 
h,(k) — » ’ 
-1 4 (B;) 
and even better approximations can be obtained with more effort. 
The basic technique for proving (25) is given, for example, in [13]. We give a 
brief sketch here. To begin, computation shows that g(x) starts as follows: 


B; ** + 0((5/4)*), (26) 


J 


q(x) =1—2x — 2x7 4+x3+x44 7x? 
+ 2x° + 5x’ — 4x8 — 7x? — 9x! — 14x! — --- 
Let 
Q(x) =1-—2x —2x?4+x° 4x44 7x? + +++ +46x"? (27) 


consist of the first 20 terms of q(x). It is not hard to verify that |O(x)| > 1/20 for 
|x| = 1/2. We want to show that Q,(x) = q(x) — Q(x) is small on |x| > 1/2. For 
|x| = 3/4, computation shows that |q.(x)| < 10, lq,(x)| < 10. By (22), 


1+ 25) Jani + 2) la, a 


la(x)| < 


Therefore, 
lq(x)| < 30 for |x| = 3/4, (28) 
which implies for |y| = 1/2, 


lO(y)l< ¥ Ix" Ja(x) | yl” 
m=20 


ore) 9) 20 
<30 ¥ (2/3)" < 90( 5 . 
m= 20 3 
Thus, |Q,(y)| < |Q(y)| for ly| = 1/2, so by Rouché’s theorem, Q(y) and q(y) 
have the same number of zeros in {y: ly| < 1/2}. However, direct computation 
shows that Q(y) has exactly one zero in this region, and therefore, so does q(y). 
Consequently, B, = 0.430729593.... is the only zero of q(y) in {y: ly| < 1/2}. 
The recurrence (22) also gives an effective method for computing the other 
zeros B,, as well as the values of p(8,), q'(B,;) and c, = 0.12268707.... 
It seems unlikely that there is as simple an expression for g(k) as the one we 
have for f(k). Poles of continued fractions such as that of (24) can seldom be 
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expressed in closed form, and are expected to be usually transcendental. There are 
few rigorous results or methods. On the other hand, accurate numerical approxi- 
mations are almost always easy to obtain. 


6. SOME HISTORY AND ACKNOWLEDGMENTS. It seems [10] that the original 
problem of showing that U/_,L(i) is unavoidable appeared as Question 5 in the 
Spring 1981 Senior Paper of the Tournament of the Towns (in the former Soviet 
Union) where it is attributed to M. Kontsevich. The solution was presented at the 
first World Federation of National Mathematics Competitions Conference held at 
the Univ. of Waterloo in 1990. (We are indebted to Andy Liu for this bit of 
scholarship.) 

We would particularly like to thank Martin Gardner for once again bringing to 
our attention a beautiful problem which looks ‘deceptively simple and yet offers 
interesting challenges. We thank M. Kontsevich for informing us of the references 
[9, 8]. We are also grateful to H. Eriksson and D. E. Knuth for their comments and 
corrections to an earlier draft. 
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Drums That Sound the Same 


S. J. Chapman 


1. INTRODUCTION. In 1966 Kac [7] asked the question ‘“‘Can you hear the shape 
of a drum?”’, that is, if you know the frequencies at which a drum vibrates, can you 
determine it’s shape? Mathematically this corresponds to the following problem. If 
u is the displacement of a membrane D from its mean position, then wu satisfies 


d*u 
V-u = rr D, (1) 
u = 0, on oD. (2) 


Seeking a solution by separation of variables u(x, y, t) = W(t)d(x, y) yields 
P xx + Pyy Wis 


= —— = constant = A, say. 


p up 
Hence 
u = sin(VAt)o(x, y), (3) 
where 
V-6 + Ad = 0, in D, (4) 
d = 0, on OD. (5) 


This is an eigenvalue problem: there exists a nonzero solution ¢@ only for certain 
values of A known as eigenvalues. The set of eigenvalues is known as the 


Figure 1 
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eigenvalue spectrum, and is discrete in this case. We see by equation (3) that the 
eigenvalues A are the squares of the frequencies of vibration, and that each 
eigensolution can be viewed as a standing wave on the domain D. The general 
solution of (1), (2) is a superposition of these special solutions. 

Kac’s question is now the following: are two domains with the same eigenvalue 
spectrum (where the eigenvalues are counted with multiplicities) necessarily con- 
gruent? 

It has been shown that the eigenvalues do determine certain properties of D, 
for example the area, the circumference, and the number of connected compo- 
nents [7]. However, the answer to the question is in fact no. Figure 1 shows an 
example of two domains with exactly the same eigenvalue spectrum which was 
given by Gordon et al. [5] (see also [6]). 


e 
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(a) (b) 


Figure 2 


A simple proof that the eigenvalues are identical has been given by Berard [3] 
(see also [1, 2]), who constructs the map shown in Figure 2, which takes an 
eigenfunction for the first domain and maps it onto an eigenfunction for the 
second domain, with the same eigenvalue A. Here A + B means that to obtain the 
value of the function in that triangle we add the values of the function at the 
corresponding points in triangles A and B. We have used different types of lines 
for the edges of the triangles to help make it clear how each should be orientated 
when making this identification. In some cases it is necessary to reflect the triangle 
about its line of symmetry, and this we have indicated by A. Only the zero function 
maps to the zero function, which implies that for any eigenfunction of the first 
eigenvalue problem there is a corresponding eigenfunction of the second eigen- 
value problem, with the same eigenvalue A. Thus any eigenvalue of the first 
problem is also an eigenvalue of the second problem (including multiplicities). A 
similar map of an eigenfunction of the second problem to one of the first shows 
that any eigenvalue of the second problem is also an eigenvalue of the first, and 
therefore the two domains are isospectral. 
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Here we give an interpretation of the transposition of a solution of the first 
problem to one of the second problem in terms of paper folding. This will allow us 
to generate many new isospectral domains, including a simple example in which 
the eigenvalues can be calculated explicitly. We note that the method of transposi- 
tion has also recently been used by Buser et al. [4] to generate new examples of 
isospectral plane domains. 


2. PAPER FOLDING. Consider a paper cutout of a domain, and a function which 
is zero on the boundary of the domain. We now fold the paper to create a new 
domain. We define a function, the transposition, on the new domain, by adding the 
values of the original function at points that lie on top of one another, with the 
convention that if the paper is reversed then the function is subtracted rather than 
added. This function will automatically be zero on the boundary of the new 
domain, since it will be zero along any fold of the paper, as well as on any edge. 
The reader may find it helpful in what follows to actually construct the shapes by 


folding paper. 
(a) 


(b) 


Figure 3 


Figure 4 
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For example consider the domain shown in Figure 2a, and a function on this 
domain which is zero on the boundary. We label the triangles A to G on the front 
and A to G on the back and fold the domain as shown in Figure 3 (where a dotted 
line indicates a fold of the paper). We then obtain a function on the domain shown 
in Figure 4 which is zero on the boundary. 

We now take several copies of the original domain D and fold them to create 
domains D,, D,, etc. We glue these together to create a new domain D*, and 
define the transposition on this domain to be the sum of the transpositions on D,, 
D,, etc. Now, if we can glue the domains D,, D,, etc. together in such a way that 
the first derivative of the transposition is continuous, then we will have actually 
created an eigenfunction on the new domain D*.' 

In order for the first derivative of the transposition to be continuous it is 
sufficient that 


(1) Every fold lies along an outside edge of the new shape. 

(2) Each edge of each copy of the original shape that lies in the interior of the 
final shape must be adjacent to its reflection on an associated copy of the 
original shape. 


Da 


(a) (b) 


Figure 5 


For example, the first derivative of the transposition is discontinuous across the 
lines (1) and (2) in Figure 4. However, if we add the same initial drum shape 
folded as shown in Figure 5 then we ensure continuity of the first derivative across 
the lines (1) and (2) (Figure 6), though the first derivative is now discontinuous 


'The resulting function is once differentiable and satisfies the eigenequation except possibly on the 
seams. Such a function is a weak solution of the equation and therefore by elliptic regularity a strong 
solution; see G. Folland, Introduction to PDE, PUP, 1976: specifically apply Corollary 6.28 repeatedly 
and then Corollary 6.10 to find that the transposed function is indeed an eigenfunction. 


1995] DRUMS THAT SOUND THE SAME 127 


across other lines. Note that the continuity of the transposed function itself is 
automatic since the original function is zero on the boundary of each component. 

Figure 7 shows how the pieces fit together for the example given in the 
introduction. Three copies of the original shape (Fig. 2a) are folded along the 
dotted lines shown in Fig. 7a, to give the shapes shown in Fig. 7c (Figure 7b shows 
a three dimensional view of how each piece will look before it is squashed flat). 
These shapes are then superimposed to create the shape shown in Fig. 2b (the 
dotted lines in Fig. 7c indicate the position of each component in the new shape). 

Another example of isospectral domains, and the transposition of a solution on 
one domain onto a solution on the other domain, is given in Figures 8 and 9. The 
cuts in these figures are to be interpreted as having zero width, and are shown for 
clarity. 

Since the method of construction of the transposition depends only on folding 
along the edges of the triangles, there is no need for the triangles to be right- 
angled. All that is important is that the two triangles adjacent to a fold lie on top 
of one another when the paper is folded. If we think of the shapes in Figure 2 as 
being constructed from a single triangle A by a series of reflections about its 
edges, then it is not the shape of A, but the series of reflections which is 
important. Choose any other triangle in place of A in Figure 10a, and perform the 
same series of reflections to obtain a new shape. Now place the same triangle in 
position d of Figure 10b, with the same orientation, and perform the series of 
reflections that created 10b from the basic right-angled triangle d. The two shapes 
obtained will then be isospectral, since the same map transposing eigenfunctions of 
one domain onto the other domain will work as before. For example, if we use the 
triangle shown in Figure 11a as our basic building block, we find that the domains 
shown in Figure 11b are isospectral. 

It is not necessary for the triangle to be isoceles. If we take the triangle shown 
in Figure 12a as our basic building block, we find that the domains shown in Figure 
12b are isospectral. 
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Figure 9 
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Figure 11 


By the same considerations, it is not even necessary for the basic shape to be a 
triangle. Any shape with at least three edges will do. We simply choose three edges 
to represent the three sides of the triangle, about which we will reflect the shape. 
If we then follow the same pattern of reflection that created the original shapes of 
Fig. 2 from the basic right-angled triangle, then we have again isospectral drums. 
The example shown in Figure 13 uses squares. 

In fact, the basic starting shape can be as complicated as you like. To construct 
different shapes, take any of the previous shapes constructed of triangles, squares 
etc. and fold both shapes until a single triangle (for example) remains. Place the 
two resulting triangles on top of one another (with the correct orientation, i.e. so 
that the solid, dotted and dashed lines match up). Now cut out shapes as when 
making paper dolls. The shapes obtained when the paper is unfolded will again 
have exactly the same eigenvalues as each other, since the same one-to-one 
correspondence between solutions will hold as before. We note that the cutout will 
be in one piece if and only if there is a segment left uncut on each edge. 

Figure 14 shows a simple example. These isospectral domains were also discov- 
ered by Gordon et al. More exotic shapes can also be made, as shown in Figure 15. 
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Figure 12 


In this way a simpler example of drums with the same eigenvalues can be 
constructed—one in which the eigenvalues can actually be calculated explicitly. 
Consider the original example, folded and cut along one edge as shown in Figure 
16a. Then the drums obtained are as shown in Figure 16b. Discarding the single 
small triangle, which appears once in each drum, we have the domains shown in 
Figure 17. The spectrum of each of the disconnected domains in Figure 17 is equal 
to the union of the spectra of each of the components (with the multiplicity of an 
eigenvalue being equal to the sum of its multiplicities in the components), since 
each of the components vibrates independently. The eigenfunctions for a rectangle 
of length a and width b are 


nwxX | may 
sin 5 n,m integers, 


sin 
with corresponding eigenvalues A = 7*((n/a)* + (m/b)’). For a right-angled 
isoceles triangle with short sides of length c the eigenfunctions are 
_imx | jwy | jax | iy as a 
sin-—— sin—— — sin—— sin-——, 1,j integers, 1> J, 
C C C C 


with corresponding eigenvalues A = w7((i/c)* + (j/c)*). Thus we find that the 
eigenvalues for each domain are as shown in Fig. 17. We shall now show that the 
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Figure 14 


combined eigenvalues of the two domains of Figure 17a are identical to the 
combined eigenvalues of the two domains of Figure 17b. 

With N even we set N = 2n and M = m. Then we have (N/2)* + M? =n? + 
m?, When N is odd we set i = max(N,2M), j = min(N,2M). Then (N/2)? + M? 
= (i/2)* + (j/2)*, and i > j. This takes care of the eigenvalues in which one of 
i,j is even and the other is odd. If we set i=7+J and j =I —J then (/* + 
J?)/2 = (i/2)? + (j/2)*, and i > j. This takes care of the eigenvalues in which 
either i and j are both even or both odd. 

Finally, we note that the same procedure works if the boundary condition 


u=0(0 ondoD, 
is modified to 

du 

— =OQondD, 

on 


if we modify our convention and add reflected triangles instead of subtracting 
them. Thus all the isospectral domains found previously are also isospectral with 
this new boundary condition. 
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Down With Determinants! 


Sheldon Axler 


1. INTRODUCTION. Ask anyone why a square matrix of complex numbers has 
an eigenvalue, and you'll probably get the wrong answer, which goes something 
like this: The characteristic polynomial of the matrix—-which is defined via 
determinants—has a root (by the fundamental theorem of algebra); this root is an 
eigenvalue of the matrix. 

What’s wrong with that answer? It depends upon determinants, that’s what. 
Determinants are difficult, non-intuitive, and often defined without motivation. As 
we'll see, there is a better proof—one that is simpler, clearer, provides more 
insight, and avoids determinants. 

This paper will show how linear algebra can be done better without determi- 
nants. Without using determinants, we will define the multiplicity of an eigenvalue 
and prove that the number of eigenvalues, counting multiplicities, equals the 
dimension of the underlying space. Without determinants, we’ll define the charac- 
teristic and minimal polynomials and then prove that they behave as expected. 
Next, we will easily prove that every matrix is similar to a nice upper-triangular 
one. Turning to inner product spaces, and still without mentioning determinants, 
we'll have a simple proof of the finite-dimensional Spectral Theorem. 

Determinants are needed in one place in the undergraduate mathematics 
curriculum: the change of variables formula for multi-variable integrals. Thus at 
the end of this paper we’ll revive. determinants, but not with any of the usual 
abstruse definitions. We’ll define the determinant of a matrix to be the product of 
its eigenvalues (counting multiplicities). This easy-to-remember definition leads to 
the usual formulas for computing determinants. We’ll derive the change of 
variables formula for multi-variable integrals in a fashion that makes the appear- 
ance of the determinant there seem natural. 

A few friends who use determinants in their research have expressed unease at 
the title of this paper. I know that determinants play an honorable role in some 
areas of research, and I do not mean to belittle their importance when they are 
indispensable. But most mathematicians and most students of mathematics will 
have a clearer understanding of linear algebra if they use the determinant-free 
approach to the basic structure theorems. 

The theorems in this paper are not new; they will already be familiar to most 
readers. Some of the proofs and definitions are new, although many parts of this 
approach have been around in bits and pieces, but without the attention they 
deserved. For example, at a recent annual meeting of the AMS and MAA, I 
looked through every linear algebra text on display. Out of over fifty linear algebra 
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texts offered for sale, only one obscure book gave a determinant-free proof that 
eigenvalues exist, and that book did not manage to develop other key parts of 
linear algebra without determinants. The anti-determinant philosophy advocated 
in this paper is an attempt to counter the undeserved dominance of determinant- 
dependent methods. 

This paper focuses on showing that determinants should be banished from 
much of the theoretical part of linear algebra. Determinants are also useless in the 
computational part of linear algebra. For example, Cramer’s rule for solving 
systems of linear equations is already worthless for 10 <x 10 systems, not to 
mention the much larger systems often encountered in the real world. Many 
computer programs efficiently calculate eigenvalues numerically—none of them 
uses determinants. To emphasize the point, let me quote a numerical analyst. 
Henry Thacher, in a review (SIAM News, September 1988) of the Turbo Pascal 
Numerical Methods Toolbox, writes, 


I find it hard to conceive of a situation in which the numerical value of a determinant is needed: 
Cramer’s rule, because of its inefficiency, is completely impractical, while the magnitude of the 
determinant is an indication of neither the condition of the matrix nor the accuracy of the 
solution. 


2. EIGENVALUES AND EIGENVECTORS. The basic objects of study in linear 
algebra can be thought of as either linear transformations or matrices. Because a 
basis-free approach seems more natural, this paper will mostly use the language of 
linear transformations; readers who prefer the language of matrices should have 
no trouble making the appropriate translation. The term /inear operator will mean 
a linear transformation from a vector space to itself; thus a linear operator 
corresponds to a square matrix (assuming some choice of basis). 

Notation used throughout the paper: n denotes a positive integer, V denotes an 
n-dimensional complex vector space, 7 denotes a linear operator on V, and J 
denotes the identity operator. 

A complex number 4 is called an eigenvalue of T if T — AJ is not injective. 
Here is the central result about eigenvalues, with a simple proof that avoids 
determinants. 


Theorem 2.1. Every linear operator on a finite-dimensional complex vector space has 
an eigenvalue. 


Proof: To show that T (our linear operator on V) has an eigenvalue, fix any 


non-zero vector v € V. The vectors v, Tv, T7v,...,T”v cannot be linearly inde- 
pendent, because V has dimension n and we have n + 1 vectors. Thus there exist 
complex numbers ay,...,a,, not all 0, such that 


AU + a,lvu +a,T"v = Q. 


Make the a’s the coefficients of a polynomial, which can be written in factored 
form as 


Ay tayzt+er+ +a,z"=c(z—-1r,) °° (Z-7,), 


where c is a non-zero complex number, each r,; is complex, and the equation holds 
for all complex z. We then have 


0=(a,J+a,T +--+ +a,T”)v 
=c(T-—r,I)-::-(T-—1,J)0v, 
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which means that T — 7,J is not injective for at least one j. In other words, T has 
an eigenvalue. oO 


Recall that a vector vu € V is called an eigenvector of T if Tv = Av for some 
eigenvalue A. The next proposition—which has a simple, determinant-free proof 
—obviously implies that the number of distinct eigenvalues of T cannot exceed the 
dimension of V. 


Proposition 2.2. Non-zero eigenvectors corresponding to distinct eigenvalues of T are 
linearly independent. 


Proof: Suppose that v,,...,v,, are non-zero eigenvectors of T corresponding to 

distinct eigenvalues A,,...,A,,. We need to prove that v,,...,v,, are linearly 

independent. To do this, suppose a,,...,a,, are complex numbers such that 
awv,+::: +a,0v,, = 0. 


Apply the linear operator (T — A,J X(T — A31)--:(T — A,,J) to both sides of the 
equation above, getting 


(A, — Az)(Ay — AZ) +7 (Ad — A) U1 = 9. 


Thus a, = 0. In a similar fashion, a; = 0 for each j, as desired. O 


3. GENERALIZED EIGENVECTORS. Unfortunately, the eigenvectors of T need 
not span V. For example, the linear operator on C? whose matrix is 


ool 


has only one eigenvalue, namely 0, and its eigenvectors form a one-dimensional 
subspace of C”. We will see, however, that the generalized eigenvectors (defined 
below) of T always span V. 

A vector u € V is called a generalized eigenvector of T if 


(T—Ar)‘v =0 


for some eigenvalue A of T and some positive integer k. Obviously, the set of 
generalized eigenvectors of T corresponding to an eigenvalue A is a subspace of V. 
The following lemma shows that in the definition of generalized eigenvector, 
instead of allowing an arbitrary power of J — AJ to annihilate v, we could have 
restricted attention to the n“ power, where n equals the dimension of V. As 
usual, ker is an abbreviation for kernel (the set of vectors that get mapped 0). 


Lemma 3.1. The set of generalized eigenvectors of T corresponding to an eigenvalue 
dA equals ker(T — AI)". 


Proof: Obviously, every element of ker(T — AJ)” is a generalized eigenvector of T 
corresponding to A. To prove the inclusion in the other direction, let v be a 
generalized eigenvector of T corresponding to A. We need to prove that 
(T —AI)"v = 0. Clearly, we can assume that v # 0, so there is a smallest non- 
negative integer k such that (T — AI)*v = 0. We will be done if we show that 
k <n. This will be proved by showing that 


v,(T —Al)v,(T —Al)’v,...,(T — Al)*'v (3.2) 
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are linearly independent vectors; we will then have k linearly independent ele- 
ments in an n-dimensional space, which implies that k <n. 

To prove the vectors in (3.2) are linearly independent, suppose ay,..., @,_, are 
complex numbers such that 


ayy +a,(T —Al)u + +++ +a,_(T—AI)*~'v = 0. (3.3) 


Apply (T — AI)*~' to both sides of the equation above, getting a,(T — AI)*~'v = 
0, which implies that a, = 0. Now apply (T — AI)*~? to both sides of (3.3), getting 
aT — AI)*~1v = 0, which implies that a, = 0. Continuing in this fashion, we see 
that a; = 0 for each j, as desired. oO 


The next result is the key tool we’ll use to give a description of the structure of a 
linear operator. 


Proposition 3.4. The generalized eigenvectors of T span V. 


Proof: The proof will be by induction on n, the dimension of V. Obviously, the 
result holds when n = 1. 

Suppose that n > 1 and that the result holds for all vector spaces of dimension 
less than n. Let A be any eigenvalue of T (one exists by Theorem 2.1). We first 
show that 


V = ker(T — AI)" @ ran(T— Al)’; (3.5) 
Y V3 


here, as usual, ran is an abbreviation for range. To prove (3.5), suppose v € V, N 
V,. Then (T — AI)"v = 0 and there exists u © V such that (T—- AI)"u =v. 
Applying (T — AI)” to both sides of the last equation, we have (T — AJ)7"u = 0. 
This implies that (T — AJ)”"u = 0 (by Lemma 3.1), which implies that v = 0. Thus 


Vi AV, = {0}. (3.6) 
Because V, and V, are the kernel and range of a linear operator on V, we have 
dim V = dim V, + dimV,. (3.7) 


Equations (3.6) and (3.7) imply @.5). 

Note that V, # {0} (because A is an eigenvalue of 7), and thus dimV, <n. 
Furthermore, because T commutes with (T — AJ)", we easily see that T maps V, 
into V,. By our induction hypothesis, V, is spanned by the generalized eigenvec- 
tors of T|y,, each of which is obviously also a generalized eigenvector of T. 
Everything in V, is a generalized eigenvector of T, and hence (3.5) gives the 
desired result. O 


A nice corollary of the last proposition is that if 0 is the only eigenvalue of T, 
then T is nilpotent (recall that an operator is called nilpotent if some power of it 
equals 0). Proof: If 0 is the only eigenvalue of T, then every vector in V is a 
generalized eigenvector of JT corresponding to the eigenvalue 0 (by Proposition 
3.4); Lemma 3.1 then implies that T” = 0. 

Non-zero eigenvectors corresponding to distinct eigenvalues are linearly inde- 
pendent (Proposition 2.2). We need an analogous result with generalized eigenvec- 
tors replacing eigenvectors. This can be proved by following the basic pattern of 
the proof of Proposition 2.2, as we now do. 
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Proposition 3.8. Non-zero generalized eigenvectors corresponding to distinct eigen- 
values of T are linearly independent. 


Proof: Suppose that v,,...,u,, are non-zero generalized eigenvectors of T corre- 
sponding to distinct eigenvalues A,,...,A,,. We need to prove that v,,...,u,, are 
linearly independent. To do this, suppose a,,...,a,, are complex numbers such 
that 


m 


av,+::: +4a,0v,, = 0. (3.9) 
Let k be the smallest positive integer such that (T — A,J)*v, = 0. Apply the linear 
operator 
(TA) \(T -a,T)" +++ (TA, 1)" 
to both sides of (3.9), getting 
a(T —A,I)“"'(T— AI)" +++ (T—A,,1)"0, = 0, (3.10) 

where we have used Lemma 3.1. If we rewrite (T — A,J)” ---(T —A,, J)” in G.10) 
as 

((T —A,J) + (A, —A,)I) ov ((T — A,1) + (Ay —A,,)1) ) 
and then expand each (7 — A,J) + (A, — A,)J)” using the binomial theorem and 
multiply everything together, we get a sum of terms. Except for the term 

(Ay Ag)" (A AR)" 


each term in this sum includes a power of (T — A,J), which when combined with 
the (T — A,J)*~! on the left and the v, on the right in (3.10) gives 0. Hence (3.10) 
becomes the equation 


n n k- 
aA, — AQ) 0 CAT An) (TA) 'v, = 0. 


Thus a, = 0. In a similar fashion, a, = 0 for each j, as desired. oO 


Now we can pull everything together into the following structure theorem. Part 
(b) allows us to interpret each linear transformation appearing in parts (c) and (d) 
as a linear operator from JU; to itself. 


Theorem 3.11. Let 4,,...,A,, be the distinct eigenvalues of T, with U,,...,U,, 
denoting the corresponding sets of generalized eigenvectors. Then 

(a) V=U,0-::: @U,; 

(b) T maps each U, into itself; 

(c) each (T — A,I)\y, is nilpotent; 

(d) each T\y, has only one eigenvalue, namely i,. 


Proof: The proof of (a) follows immediately from Propositions 3.8 and 3.4. 
To prove (b), suppose v € U.. Then (T — A, )‘v = 0 for some positive integer 
k. We have 
(T —A,I)"Tv = T(T —A,1)-v = T(0) = 0. 


Thus Tv € U,, as desired. 
The proof of (c) follows immediately from the definition of a generalized 
eigenvector and Lemma 3.1. 
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To prove (d), let XY be an eigenvalue of Tly, with corresponding non-zero 
eigenvector v € U.. Then (T — A,J)v = (X — A,)u, and hence 


(T—A,I)‘v = (X -A,)*v 


for each positive integer k. Because v is a generalized eigenvector of T corre- 
sponding to 4,, the left-hand side of this equation is 0 for some k. Thus Y = A jo as 
desired. oO 


4. THE MINIMAL POLYNOMIAL. Because the space of linear operators on V is 
finite dimensional, there is a smallest positive integer k such that 


I,T,T’,...,T* 


are not linearly independent. Thus there exist unique complex numbers 
Qoy,.-+,@,_, Such that 


aol +a,T +a,T* +--+: a,_,T* 1+ T* =0. 
The polynomial 
Ay + ayZ + ayz7 + +++ ay_yz* 142% 


is called the minimal polynomial of T. It is the monic polynomial p of smallest 
degree such that p(T) = 0 (a monic polynomial is one whose term of highest 
degree has coefficient 1). 

The next theorem connects the minimal polynomial to the decomposition of V 
as a direct sum of generalized eigenvectors. 


Theorem 4.1. Let A,,...,A,, be the distinct eigenvalues of T, let U, denote the set of 
generalized eigenvectors corresponding to A,, and let a; be the smallest positive 
integer such that (T — A,I)*iv = 0 for every v € U,. Let 


p(z) =(Z-A) (ZA). (4.2) 
Then 
(a) p is the minimal polynomial of T; 
(b) p has degree at most dim V; 
(c) if q is a polynomial such that q(T) = 0, then q is a polynomial multiple of p. 


Proof: We will prove first (b), then (c), then (a). 

To prove (b), note that each a; is at most the dimension of U; (by Lemma 3.1 
applied to Tly,). Because V = U, ® ::: © U,, (by Theorem 3.11(a)), the a,’s can 
add up to at most the dimension of V. Thus (b) holds. 

To prove (c), suppose q is a polynomial such that q(T) = 0. If we show that q is 
a polynomial multiple of each (z — A,)*/, then (c) will hold. To do this, fix j. The 
polynomial q has the form 


q(z) =c(z—1)" ++ (z= 1) (z- Ay)’, 


where c € C, the r,’s are complex numbers all different from A,, the 6,’s are 
positive integers, and 6 is a non-negative integer. If c = 0, we are done, so assume 
that c # 0. Suppose v € U,. Then (T — A,J)°u is also in U; (by Theorem 3.11(b)). 
Now 


c(T — 1,1)" +++ (T- ry lo“ (T — A,T)°v = q(T)v = 0, 
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and (T — r,J)*! ---(T—ryJI)™ is injective on U, (by Theorem 3.11(d)). Thus 
(T - AI )°v = 0. Because v was an arbitrary element of U;, this implies that 
a, < 5. Thus q is a polynomial multiple of (z — A,)*i, and (c) holds. 

To prove (a), suppose v is a vector in some U;. If we commute the terms of 
(T—A,1)™ +--+ (T—A,,1)% (which equals p(T)) so that (T —A,J)*% is on the 
right, we see that p(T)v = 0. Because U,,...,U,, span V (Theorem 3.11(a)), we 
conclude that p(T) = 0. In other words, p is a monic polynomial that annihilates 
T. We know from (c) that no monic polynomial of lower degree has this property. 
Thus p must be the minimal polynomial of T, completing the proof. oO 


Note that by avoiding determinants we have been naturally led to the descrip- 
tion of the minimal polynomial in terms of generalized eigenvectors. 


5. MULTIPLICITY AND THE CHARACTERISTIC POLYNOMIAL. The multi- 
plicity of an eigenvalue A of T is defined to be the dimension of the set of 
generalized eigenvectors of 7 corresponding to A. We see immediately that the 
sum of the multiplicities of all eigenvalues of T equals n, the dimension of V 
(from Theorem 3.11(a)). Note that the definition of multiplicity given here has a 
clear connection with the geometric behavior of T, whereas the usual definition (as 
the multiplicity of a root of the polynomial det(zJ — T)) describes an object 
without obvious meaning. 

Let A,,...,A,, denote the distinct eigenvalues of 7, with corresponding multi- 
plicities B,,...,8,,. The polynomial 


(z— AP (ZA) (5.1) 


is called the characteristic polynomial of T. Clearly, it is a polynomial of degree n. 

Of course the usual definition of the characteristic polynomial involves a 
determinant; the characteristic polynomial is then used to prove the existence of 
eigenvalues. Without mentioning determinants, we have reversed that procedure. 
We first showed that 7 has n eigenvalues, counting multiplicities, and then used 
that to give a more natural definition of the characteristic polynomial (“counting 
multiplicities’ means that each eigenvalue is repeated as many times as its 
multiplicity). 

The next result is called the Cayley-Hamilton Theorem. With the approach 
taken here, its proof is easy. 


Theorem 5.2. Let q denote the characteristic polynomial of T. Then q(T) = 0. 


Proof: Let U; and a, be as in Theorem 4.1, and let 6; equal the dimension of Uj. 
As we noted earlier, a, < B; (by Lemma 3.1 applied to T|y,). Hence the character- 
istic polynomial (5.1) is a polynomial multiple of the minimal polynomial (4.2). 
Thus the characteristic polynomial must annihilate T. O 


6. UPPER-TRIANGULAR FORM. A square matrix is called upper-triangular if all 
the entries below the main diagonal are 0. Our next goal is to show that each linear 
operator has an upper-triangular matrix for some choice of basis. We’ll begin with 
nilpotent operators; our main structure theorem will then easily give the result for 
arbitrary linear operators. 
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Lemma 6.1. Suppose T is nilpotent. Then there is a basis of V with respect to which 
the matrix of T contains only 0’s on and below the main diagonal. 


Proof: First choose a basis of ker T. Then extend this to a basis of ker-T?. Then 
extend to a basis of ker T°. Continue in this fashion, eventually getting a basis of 
V. The matrix of JT with respect to this basis clearly has the desired form. O 


By avoiding determinants and focusing on generalized eigenvectors, we can now 
give a simple proof that every linear operator can be put in upper-triangular form. 
We actually get a better result, because the matrix in the next theorem has many 
more 0’s than required for upper-triangular form. 


Theorem 6.2. Let X,,...,4,, be the distinct eigenvalues of T. Then there is a basis of 
V with respect to which the matrix of T has the form 


Proof: This follows immediately from Theorem 3.11 and Lemma 6.1. D 


For many traditional uses of the Jordan form, the theorem above can be used 
instead. If Jordan form really is needed, then many standard proofs show (without 
determinants) that every nilpotent operator can be put in Jordan form. The result 
for general linear operators then follows from Theorem 3.11. 


7. THE SPECTRAL THEOREM. In this section we assume that ¢,) is an inner 
product on V. The nicest linear operators on V are those for which there is an 
orthonormal basis of V consisting of eigenvectors. With respect to any such basis, 
the matrix of the linear operator is diagonal, meaning that it is 0 everywhere 
except along the main diagonal, which must contain the eigenvalues. The Spectral 
Theorem, which we’ll prove in this section, describes precisely those linear opera- 
tors for which there is an orthonormal basis of V consisting of eigenvectors. 
Recall that the adjoint of T is the unique linear operator T* on V such that 


(Tu,v) = (u,T*v) 


for all u,v © V. The linear operator T is called normal if T commutes with its 
adjoint; in other words, T is normal if TT* = T*T. The linear operator T is called 
self-adjoint if T = T*. Obviously, every self-adjoint operator is normal. We’ll see 
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that the normal operators are precisely the ones that can be diagonalized by an 
orthonormal basis. Before proving that, we need a few preliminary results. Note 
that the next lemma is trivial if T is self-adjoint. 


Lemma 7.1. If T is normal, then ker T = ker T*. 


Proof: If T is normal and v € V, then 
(Tv, Tv) = (T*Tv,v) = (TT*v,v) = (T*v,T*v). 
Thus Tv = 0 if and only if T*v = 0. Oo 


The next proposition, combined with our result that the generalized eigenvec- 
tors of a linear operator span the domain (Proposition 3.4), shows that the 
eigenvectors of a normal operator span the domain. 


Proposition 7.2. Every generalized eigenvector of a normal operator is an eigenvector 
of the operator. 


Proof: Suppose T is normal. We will prove that 


ker T* = ker T (7.3) 


for every positive integer k. This will complete the proof of the proposition, 
because we can replace T in (7.3) by T — AI for arbitrary A € C. 

We prove (7.3) by induction on k. Clearly, the result holds for k = 1. Suppose 
now that k is a positive integer such that (7.3) holds. Let v € ker T**+!. Then 
T(T*v) = T*t'v =0. In other words, T*v € ker T, and so T*(T*v) = 0 (by 
Lemma 7.1). Thus 


0=(T*(T*v), Tv) = (T*v,T*v). 
Hence v € ker J“, which implies that v € ker T (by our induction hypothesis). 


Thus ker T**! = ker T, completing the induction. Oo 


The last proposition, together with Proposition 3.4, implies that a normal 
operator can be diagonalized by some basis. The next proposition will be used to 
show that this can be done by an orthonormal basis. 


Proposition 7.4. Eigenvectors of a normal operator corresponding to distinct eigen- 
values are orthogonal. 


Proof: Suppose T is normal and a, A are distinct eigenvalues of T, with corre- 
sponding eigenvectors u,v. Thus (J — AI)v = 0, and so (T* —AI)u = 0 (by 
Lemma 7.1). In other words, v is also an eigenvector of 7*, with eigenvalue A. 
Now 


(a —A){u,v) = (au,v) — (u, Av) 
= (Tu,v) — (u,T*v) 
= (Tu,v) — (Tu,v) 


= 0. 
Thus (u,v) = 0, as desired. oO 
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Now we can put everything together, getting the finite-dimensional Spectral 
Theorem for complex inner product spaces. 


Theorem 7.5. There is an orthonormal basis of V consisting of eigenvectors of T if 
and only if T is normal. 


Proof: To prove the easy direction, first suppose that there is an orthonormal basis 
of V consisting of eigenvectors of T. With respect to that basis, T has a diagonal 
matrix. The matrix of T* (with respect to the same basis) is obtained by taking the 
conjugate transpose of the matrix of T; hence 7* also has a diagonal matrix. Any 
two diagonal matrices commute. Thus T commutes with T7*, which means that T is 
normal. 

To prove the other direction, now suppose that 7 is normal. For each eigen- 
value of JT, choose an orthonormal basis of the associated set of eigenvectors. The 
union of these bases (one for each eigenvalue) is still an orthonormal set, because 
eigenvectors corresponding to distinct eigenvalues are orthogonal (by Proposition 
7.4). The span of this union includes every eigenvector of T (by construction), and 
hence every generalized eigenvector of T (by Proposition 7.2). But the generalized 
eigenvectors of T span V (by Proposition 3.4), and so we have an orthonormal 
basis of V consisting of eigenvectors of T. oO 


The proposition below will be needed in the next section, when we prove the 
Spectral Theorem for real inner product spaces. 


Proposition 7.6. Every eigenvalue of a self-adjoint operator is real. 


Proof: Suppose T is self-adjoint. Let A be an eigenvalue of 7, and let v be a 
non-zero vector in V such that Jv = Av. Then 


Allull? = (Av, v) = (Tv, v) = (v, Tv) = (v, Av) = Allull’. 
Thus A = A, which means that A is real, as desired. oO 


8. GETTING REAL. So far we have been dealing only with complex vector spaces. 
As we'll see, a real vector space U can be embedded, in a natural way, in a 
complex vector space called the complexification of U. Each linear operator on U 
can be extended to a linear operator on the complexification of U. Our results 
about linear operators on complex vector spaces can then be translated to 
information about linear operators on real vector spaces. Let’s see how this 
process works. 

Suppose that U is a real vector space. As a set, the complexification of U, 
denoted U,, equals U x U. Formally, a typical element of U, is an ordered pair 
(u,v), where u,v € U, but we will write this as u + iv, for obvious reasons. We 
define addition on U, by 


(u, + iv,) + (u, + wy) = (u, + Uy) + i(U, + U2). 
The notation shows how we should define multiplication by complex scalar on U;: 
(a + ib)\(u + iv) = (au — bv) + i(av + bu) 


for a,b © R and u,v © U. With these definitions of addition and scalar multipli- 
cation, U. becomes a complex vector space. We can think of U as a subset of U; 
by identifying u € U with u + i0. Clearly, any basis of the real vector space U is 
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also a basis of the complex vector space U.. Hence the dimension of U as a real 
vector space equals the dimension of U;. as a complex vector space. 

For S a linear operator on a real vector space U, the complexification of S, 
denoted S,, is the linear operator on U, defined by 


Sc(u + iv) = Su + iSv 


for u,v € U. If we choose a basis of U and also think of it as a basis of U,, then 
clearly S and S, have the same matrix with respect to this basis. 

Note that any real eigenvalue of S, is also an eigenvalue of S (because if a € R 
and S.(u + iv) = a(u + iv), then Su = au and Suv = av). Non-real eigenvalues of 
S¢ come in pairs. More precisely, 


(Sg —Al)’(u + iv) = 0 © (Sg — AI) (u — iv) = 0 (8.1) 


for j a positive integer, A € C, and u,v € U, as is easily proved by induction on j. 
In particular, if A € C is an eigenvalue of S,, then so is A, and the multiplicity of 
d (recall that this is defined as the dimension of the set of generalized eigenvectors 
of S. corresponding to A) is the same as the multiplicity of A. Because the sum of 
the multiplicities of all the eigenvalues of S$, equals the (complex) dimension of U, 
(by Theorem 3.11(a)), we see that if U. has odd (complex) dimension, then S, 
must have a real eigenvalue. Putting all this together, we have proved the following 
theorem. Once again, a proof without determinants offers more insight into why 
the result holds than the standard proof using determinants. 


Theorem 8.2. Every linear operator on an odd-dimensional real vector space has a 
real eigenvalue. 


The minimal and characteristic polynomials of a linear operator S on a real 
vector space are defined to be the corresponding polynomials of the complexifica- 
tion S,. Both these polynomials have real coefficients—this follows from our 
definitions of minimal and characteristic polynomials and (8.1). The reader should 
be able to derive the properties of these polynomials easily from the corresponding 
results on complex vector spaces (Theorems 4.1 and 5.2). 

Our procedure for transferring results from complex vector spaces to real vector 
spaces can also be used to prove the real Spectral Theorem. To see how that 
works, suppose now that U is a real inner product space with inner product <, >. 
We make the complexification U,. into a complex inner product space by defining 
an inner product on U¢ in the obvious way: 


Cu, + i,,U, + Wy) = (uy, Uz) + (U,,05) + i(U,, Uz) — iXuy, V2). 


Note that any orthonormal basis of the real inner product space U is also an 
orthonormal basis of the complex inner product space Ug. 

If S is a self-adjoint operator on U, then obviously S, is self-adjoint on U;. We 
can then apply the complex Spectral Theorem (Theorem 7.5) to S$, and transfer to 
U, getting the real Spectral Theorem. The next theorem gives the formal statement 
of the result and the details of the proof. 


Theorem 8.3. Suppose U is a real inner product space and S is a linear operator on 
U. Then there is an orthonormal basis of U consisting of eigenvectors of S if and only 
if S is self-adjoint. 
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Proof: To prove the easy direction, first suppose that there is an orthonormal basis 
of U consisting of eigenvectors of S. With respect to that basis, S has a diagonal 
matrix. Clearly, the matrix of S* (with respect to the same basis) equals the matrix 
of S. Thus S is self-adjoint. 

To prove the other direction, now suppose that S is self-adjoint. As noted 
above, this implies that S, is self-adjoint on U,. Thus there is a basis 


{u, + W,,...,u, + w,} (8.4) 


of U, consisting of eigenvectors of S. (by the complex Spectral Theorem, which is 
Theorem 7.5); here each u, and v, is in U. Each eigenvalue of S¢ is real 
(Proposition 7.6), and thus each u, and each vu, is an eigenvector of S. Clearly, 
{u,,U,,..-,U,,U,} spans U (because (8.4) is a basis of U,). Conclusion: The 
eigenvectors of S span U. 

For each eigenvalue of S, choose an orthonormal basis of the associated set of 
eigenvectors in U. The union of these bases (one for each eigenvalue) is still 
orthonormal, because eigenvectors corresponding to distinct eigenvalues are or- 
thogonal (Proposition 7.4). The span of this union includes every eigenvector of S 
(by construction). We have just seen that the eigenvectors of S span U, and so we 
have an orthonormal basis of U consisting of eigenvectors of S, as desired. O 


9. DETERMINANTS. At this stage we have proved most of the major structure 
theorems of linear algebra without even defining determinants. In this section we 
will give a simple definition of determinants, whose main reasonable use in 
undergraduate mathematics is in the change of variables formula for multi-variable 
integrals. 

The constant term of the characteristic polynomial of JT is plus or minus the 
product of the eigenvalues of 7, counting multiplicity (this is obvious from our 
definition of the characteristic polynomial). Let’s look at some additional motiva- 
tion for studying the product of the eigenvalues. 

Suppose we want to know how to make a change of variables in a multi-variable 
integral over some subset of R”. After linearization, this reduces to the question of 
how a linear operator S on R” changes volumes. Let’s consider the special case 
where S is self-adjoint. Then there is an orthonormal basis of R” consisting of 
eigenvectors of S (by the real Spectral Theorem, which is Theorem 8.3). A 
moment’s thought about the geometry of an orthonormal basis of eigenvectors 
shows that if E is a subset of R”, then the volume (whatever that means) of S(E) 
must equal the volume of E multiplied by the absolute value of the product of the 
eigenvalues of S, counting multiplicity. We'll prove later that a similar result holds 
even for non-self-adjoint operators. At any rate, we see that the product of the 
eigenvalues seems to be an interesting object. An arbitrary linear operator on a 
real vector space need not have any eigenvalues, so we will return to our familiar 
setting of a linear operator T on a complex vector space V. After getting the basic 
results on complex vector spaces, we’ll deal with real vector spaces by using the 
notion of complexification discussed earlier. 

Now we are ready for the formal definition. The determinant of T, denoted 
det T, is defined to be the product of the eigenvalues of 7, counting multiplicity. 
This definition would not be possible with the traditional approach to eigenvalues, 
because that method uses determinants to prove that eigenvalues exist. With the 
techniques used here, we already know (by Theorem 3.11(a)) that T has dimV 
eigenvalues, counting multiplicity. Thus our simple definition makes sense. 
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In addition to simplicity, our definition also makes transparent the following 
result, which is not at all obvious from the standard definition. 


Theorem 9.1. An operator is invertible if and only if its determinant is non-zero. 


Proof: Clearly, T is invertible if and only if 0 is not an eigenvalue of 7, and this 
happens if and only if det T ¥ 0. 0 


With our definition of determinant and characteristic polynomial, we see 
immediately that the constant term of the characteristic polynomial of T equals 
(— 1)” det T, where n = dim V. The next result shows that even more is true—our 
definitions are consistent with the usual ones. 


Proposition 9.2. The characteristic polynomial of T equals det(zI — T). 


Proof: Let 2,,...,A,, denote the eigenvalues of 7, with multiplicities B,,..., B,,. 
Thus for z € C, the eigenvalues of zJ — T are z — A,,..., Zz — A,,, with multiplici- 
ties B,,...,8,,. Hence the determinant of zJ — T is the product 


(z-A,)M + (2 -A,)P™, 


which equals the characteristic polynomial of T. oO 


Note that determinant is a similarity invariant. In other words, if S is an 
invertible linear operator on V, then T and STS~' have the same determinant 
(because they have the same eigenvalues, counting multiplicity). 

We define the determinant of a square matrix of complex numbers to be the 
determinant of the corresponding linear operator (with respect to some choice of 
basis, which doesn’t matter, because two different bases give rise to two linear 
operators that are similar and hence have the same determinant). Fix a basis of V, 
and for the rest of this section let’s identify linear operators on V with matrices 
with respect to that basis. How can we find the determinant of 7 from its matrix, 
without finding all the eigenvalues? Although getting the answer to that question 
will be hard, the method used below will show how someone might have discovered 
the formula for the determinant of a matrix. Even with the derivation that follows, 
determinants are difficult, which is precisely why they should be avoided. 

We begin our search for a formula for the determinant by considering matrices 


of a special form. Let a,,...,a, < C. Consider a linear operator T whose matrix 
1S 
0 a, 
a, 0 
a, 0 (9.3) 
an_ | 0 


here all entries of the matrix are 0 except for the upper right-hand corner and 
along the line just below the main diagonal. Let’s find the determinant of T. Note 
that T” =a,,...,a,J. Because the first columns of {/,T7,... ,T"~'} are linearly 
independent (assuming that none of the a ; Is 0), no polynomial of degree less than 
n can annihilate 7. Thus z” — a, --- a, is the minimal polynomial of 7. Hence 


n 
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n 


z” — a, °°: a, 1s also the characteristic polynomial of 7. Thus 


det T= (-1)" ‘a, «+: a,. 
(if some a; is 0, then clearly T is not invertible, so det T = 0, and the same 
formula holds.) 

Now let + be a permutation of {1,...,}, and consider a matrix T whose j™ 
column consists of all zeroes except for a, in the r( j)™ row. The permutation 7 is 
a product of cyclic permutations. Thus T is similar to (and so has the same 
determinant as) a block diagonal matrix where each block of size greater than one 
has the form of (9.3). The determinant of a block diagonal matrix is obviously the 
product of the determinants of the blocks, and we know from the last paragraph 
how to compute those. Thus we see that det T = (signr)a, --- a,. To put this 
into a form that does not depend upon the particular permutation 7, let ¢; , denote 
the entry in row i, column j, of T (so t; ; = 0 unless i = r(j)), and let P() denote 


the set of all permutations of {1,...,}. Then 
detT= )) (signa)tray1 °** tau,n> (9.4) 
we P(n) 


because each summand is 0 except the one corresponding to the permutation 7. 
Consider now an arbitrary matrix T with entries ¢; ;. Using the paragraph above 

as motivation, we guess that the formula for det T is given by (9.4). The next 

proposition shows that this guess is correct and gives the usual formula for the 

determinant of a matrix. 

Proposition 9.5. det(T) = Lc pay (SigN Magy 1 °° ° t 


a(n), n° 
Proof: Define a function d on the set of n X n matrices by 


d(T)= (Sign 7) tray. °° tan), n- 
aE P(n) 
We want to prove that det T = d(T). To do this, choose S so that STS~' is in the 
upper triangular form given by Theorem 6.2. Now d(STS~') equals the product of 
the entries on the main diagonal of STS~‘' (because only the identity permutation 
makes a non-zero contribution to the sum defining d(STS~')). But the entries on 
the main diagonal of STS~'! are precisely the eigenvalues of T, counting multiplic- 
ity, so det T = d(STS~'). Thus to complete the proof, we need only show that d is 
a similarity invariant; then we will have det T = d(STS~') = d(T). 

To show that d is a similarity invariant, first prove that d is multiplicative, 
meaning that d(AB) = d(A)d(B) for all n < n matrices A and B. The proof that 
d is multiplicative, which will not be given here, consists of a straightforward 
rearrangement of terms appearing in the formula defining d( AB) (see any text 
that defines det(T) to be d(T) and then proves that det AB = (det A)(det B)). 
The multiplicativity of d now leads to a proof that d is a similarity invariant, as 
follows: 


d(STS~') = d(ST)d(S~') = d(S~')d(ST) = d(S~'ST) = d(T). 
Thus det T = d(T), as claimed. O 
All the usual properties of determinants can be proved either from the (new) 


definition or from Proposition 9.5. In particular, the last proof shows that det is 
multiplicative. 
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The determinant of a linear operator on a real vector space is defined to be the 
determinant (product of the eigenvalues) of its complexification. Proposition 9.5 
holds on real as well as complex vector spaces. To see this, suppose that U is a real 
vector space and S is a linear operator on U. If we choose a basis of U and also 
think of it as a basis of the complexification U,, then S and its complexification S, 
have the same matrix with respect to this basis. Thus the formula for det S, which 
by definition equals det S,, is given by Proposition 9.5. In particular, det S is real. 
The multiplicativity of det on linear operators on a real vector space follows from 
the corresponding property on complex vector spaces and the multiplicativity of 
complexification: (AB). = Ac Be whenever A and B are linear operators on a 
real vector space. 

The tools we’ve developed provide a natural connection between determinants 
and volumes in R”. To understand that connection, first we need to explain what is 
meant by the square root of an operator times its adjoint. Suppose S is a linear 
operator on a real vector space U. If A is an eigenvalue of S*S and u € U isa 
corresponding non-zero eigenvector, then 

ACu,u> = (Au, u) = (S*Su,u) = (Su, Sud, 

and thus A must be a non-negative number. Clearly, S*S is self-adjoint, and so 
there is a basis of U consisting of eigenvectors of S*S (by the real Spectral 
Theorem, which is Theorem 8.3). We can think of S*S as a diagonal matrix with 
respect to this basis. The entries on the diagonal, namely the eigenvalues of S*S, 
are all non-negative, as we have just seen. The square root of S*S, denoted yS*5S, 
is the linear operator on U corresponding to the diagonal matrix obtained by 
taking the non-negative square root of each entry of the matrix of $*S. Obviously, 
VS*S is self-adjoint, and its square equals S*S. Also, the multiplicativity of det 
shows that 


(det (S*S) = det(S*S) = (det S*)(det S) = (det S)’. 


Thus det VS*S = |det S| (because det VS*S must be non-negative). 

The next lemma provides the tool we will use to reduce the question of volume 
change by a linear operator to the self-adjoint case. It is called the polar 
decomposition of an operator S, because it resembles the polar decomposition of a 
complex number z = er. Here r equals Vzz (analogous to ¥S*S in the lemma), 
and multiplication by e’® is an isometry on C (analogous to the isometric property 
of A in the lemma). 


Lemma 9.6. Let S be a linear operator on a real inner product space U. Then there 
exists a linear isometry A on U such that S = AyS*S. 


Proof: For u € U we have 


lVS*Sull =(VS*Su, VS*Su) = (S*Su,u) = (Su, Su) = ||Sull?. 


In other words, ||VS*S ull = ||Sul|. Thus the function A defined on ranVS*S by 
A(VS*S u) = Su is well defined and is a linear isometry from ran VS*S onto ran S. 
Extend A to a linear isometry of U onto U by first extending A to be any isometry 
of (ran /S*S5)+ onto (ran S)+ (these two spaces have the same dimension, 
because we have just seen that there is a linear isometry of ran VS*§ onto ran S), 
and then extend A to all of U by linearity (with the Pythagorean Theorem 
showing that A is an isometry on all of U). The construction of A shows that 
S = AVS*S , as desired. 0 
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Now we are ready to give a clean, illuminating proof that a linear operator 
changes volumes by a factor of the absolute value of the determinant. We will not 
formally define volume, but only use the obvious properties that volume should 
satisfy. In particular, the subsets EF of R” considered in the theorem below should 
be restricted to whatever class the reader uses most comfortably (polyhedrons, 
open sets, or measurable sets). 


Theorem 9.7. Let S be a linear operator on R”. Then 
vol S$(E) = |det S| vol E 
for E CR”. 


Proof: Let S = AyS*S be the polar decomposition of S as given by Lemma 9.6. 
Let E Cc R”. Because A is an isometry, it does not change volumes. Thus 


vol S(E) = vol A(VS*S(E)) = vol VS*S (E). 


But yS*S is self-adjoint, and we already noted at the beginning of this section that 
each self-adjoint operator changes volume by a factor equal to the absolute value 
of the determinant. Thus we have 


vol S(E) = vol VS*S(E) =|det VS*S|vol E = |det S| vol E, 
as desired. 0 


10. CONCLUSION. As mathematicians, we often read a nice new proof of a 
known theorem, enjoy the different approach, but continue to derive our internal 
understanding from the method we originally learned. This paper aims to change 
drastically the way mathematicians think about and teach crucial aspects of linear 
algebra. The simple proof of the existence of eigenvalues given in Theorem 2.1 
should be the one imprinted in our minds, written on our blackboards, and 
published in our textbooks. Generalized eigenvectors should become a central tool 
for the understanding of linear operators. As we have seen, their use leads to 
natural definitions of multiplicity and the characteristic polynomial. Every mathe- 
matician and every linear algebra student should at least remember that the 
generalized eigenvectors of an operator always span the domain (Proposition 3.4) 
—this crucial result leads to easy proofs of upper-triangular form (Theorem 6.2) 
and the Spectral Theorem (Theorems 7.5 and 8.3). 

Determinants appear in many proofs not discussed here. If you scrutinize such 
proofs, you'll often discover better alternatives without determinants. Down with 
Determinants! 
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Mary Ellen Rudin, in 1970. 
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Edited by: John Duncan 


Where Not to Find the Critical Points 
of a Polynomial— Variation on 
a Putnam Theme 


Peter Andrews 
If a real polynomial of degree n has distinct roots r,; <r, < --: <~, , then Rolle’s 
Theorem tells us that the critical points x,;, 7 = 1,2,...,n — 1 lie in the intervals 


(r;, r;,,) respectively. This note addresses the question of what restrictions there 
might be on where in these intervals the critical points can be located? Two recent 
notes in this Monthly, for instance, show that the critical points cannot be any 
closer together than the roots themselves [4], [1]. 

Along these same lines, Problem A-3 on the 1991 Putnam competition reads: 


Find all real polynomials p(x) of degree n > 2 for which there exist real 
numbers r, <r, < °:: <~+r, such that 
1. p(r;) = 0, i = 1,2,...,n and 


Pi SF; 
2. p(-—*") = 0,i=1,2,...,n-1, 


where p’(x) is the derivative of p(x). 


This essentially asks, “Can the roots r; be chosen so that each critical point x, is 
at the midpoint of the interval (7,, 7;, ,)?”” When n > 2 the answer is no! 

Since p(x) is a degree-n polynomial with n distinct real roots, it can be written 
(at least up to a constant multiple) as p(x) = TI?_,(x —r,). The product rule 
and a straightforward calculation show that p’(r, + r,)/2) = 0 (and likewise 
p'(r,_1 + 17,,)/2) = 0) if and only if n = 2 [3, p. 719-720]. 

This tells us that, unless n = 2, neither the leftmost nor the rightmost critical 
point can be exactly halfway between the surrounding roots. A closer study reveals 
that all the critical points are bounded away from the roots in a rather intriguing 
manner. 


Theorem. If p(x) is a degree-n, real polynomial with distinct rootsr, <r ,< ++: < 
r, and critical points x, <X,< *+* <x,_, then 


— 7 1S a SS 


l 


The main observation we need is the “Root Dragging Theorem” of [1]. It states 
that if we move any of the roots to the right (i.e. increase them) then all of the 
critical points move to the right. We will sketch an alternative proof to that of [1]. 
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If p(x) = I17_,(x — r,), then since p’(x;) = 0, x; must satisfy the equation 


Il, i Xj —_ FP; 
X,-n4=- Tien) (1) 
Lewvillani(% — r;) 
If we define u,; =x,;—7, and F(u,,...,u,) = —11,,;4;/L,4 12, iu;, then 
(1) becomes 
u; = F(u,,...,uU,). (2) 


Moreover, 0F;,/du; < 0 for i + j. This means that if we increase r, (j # i) then u, 
decreases and F, increases. This makes u, < F(u,,...,u,). If x; is now decreased 
then each u, will decrease and this inequality will get even worse. Thus, the only 
way we can get back to equality in (2) and hence to the critical point is by moving 
x, to the right as well. 

Now, to return to the proof of our theorem, suppose we fix r; and r;,,. We can 
move x, as far to the right as possible by letting 7,,...,7;_, > 7; and r;,5,...,7, 
— o, This suggests that we look at 


j-4 
a(x) = (x — 7) (4 - Fai(x— by," (3) 
and let b > », 
Differentiating q,(x)-shows us that the critical point x; must be the leftmost 
root of the quadratic equation 


nx* — ((n — 1)r,4, + (n —i)r;, + (i + 1)b)x 
+ (ibr;,, + br; + (n —i — 1)r;,47;) = 0. (4) 
As b>, x; then approaches the leftmost (and only) root of the linear 
equation 
—(it+1)x+ir;,,,+7,= 0. (5) 
Thus, x; 7 (ir;,, +7,)/G + D and (x; — 7)/(;4, — 7) 2i1/G + DD. 
Similarly, to see how far to the left in (7;,7;,,) and x, could be, we look at 


So(x) = (x +b) "(x ~ 7))(4 ~ Tia)” (6) 
and let b — o, This time, x, decreases toward the root of the equation 
(n-it1)x-7,4,,—-(n—i)r, = 0. (7) 


This means that x; \ (7;,, + (n —Dr,)/(n —i + 1) and (x; —7)/Cj4, —r) NS 
1/(n —i + 1). 

Notice that we are measuring the position of x, in the interval (7,,7r,,,) by 
looking at o; = (x; — r;)/(7;,, — r,), the ratio in which x, divides (7,, r;,,). If we 
define, for each polynomial p(x) = II7_,(x —7,) with r; <r, < +--+ <~x7,, the 
(n — 1)-tuple of these ratios, 0, = (a1, 02,...,0,_,), and for each n > 2, the set 


n-1 
XxX, = [T]G/( -i+1),i7@i+))), (8) 
i=] 
then our theorem can be re-stated as saying that 
a, &X,, (9) 


where n > 2 is the degree of p. 

It is not clear, however, that every (g,,0,...,0,_,) € X, can be realized as 
the ratio vector of a degree-n polynomial with distinct roots. Suppose we define 
Y, CX, to be the set of those (m — 1)-tuples that are the ratio vectors of 
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polynomials. Now our theorem appears as 
Y, CX, (10) 
and the essence of the 1991 Putnam question could be re-phrased as: 
For which values of n > 2 is the (nm — 1)-tuple (1/2,1/2,...,1/2) in Y,? 


Since (1/2,1/2,...,1/2) € X,,, (10) shows that it can’t be in Y.,. 

This still leaves the question of whether or not Y, = X,, and, if not, exactly what 
does Y, look like? While we cannot answer these questions yet, we can leave you 
with some pictorial evidence that Y, is a rather interesting hypersurface in X,,. 

Since the location of the critical points relative to the roots is clearly invariant 
under translation, we can assume that the smallest root, r,, is at the origin. Figure 
1 was produced by plotting a, for the cubic polynomials p(x) = x(x — r,\(x — rs), 
as r, ranges from 0.1 to 2.0 in steps of 0.1 and r, ranges from r, + 0.1 to 4.0 in 
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steps of 0.1. This picture inspires one to look for a single equation satisfied by o, 
and o,. In fact, it is not too hard to show that 


(1 — 01)o2 = 3. (11) 

To see this, translate p so that r, = 0. Then p’(x) = 3x? — 2(r, + 1r3)x + ryr3. 
From this, we see that the product of the two roots of p’ is r,r,/3. However, the 
roots of p’ are (1 — o,)r, and o4r;3.' 

Figure 2 was produced in a similar manner but using quartic polynomials of the 
form p(x) = x(x — r,x — r3x — r,). The critical points were approximated by 
numerically solving the cubic equation p’(x) = 0 using Maple’s fsolve procedure. 
This time Y, clearly appears to be a smooth surface in X,. 
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A Short Path to the Shortest Path 


Peter D. Lax 


This note contains a demonstration of the isoperimetric inequality. Our proof is 
somewhat simpler and more straightforward than the usual ones; it is eminently 
suitable for presentation in an honors calculus course. 

1. The Isoperimetric Inequality says that a closed plane curve of length 27 
encloses an area < 7. Equality holds only for a circle. 

Let x(s), y(s) be the parametric presentation of the curve, s arclength, 0 <s < 
27. Suppose that we have so positioned the curve that the points x(0), y(O) and 
x(a), yar) lie on the x-axis, i.e. 


y(0) =0=y(7). (1) 
The area enclosed by the curve is given by the formula 
A= ["yeas, (2) 
0 


where the dot ' denotes differentiation with respect to s. We write this integral as 
the sum A, + A, of an integral from 0 to 7 and from 7 to 277, and show that 
each is < 5. 


‘The author thanks the referee for this particularly nice derivation of (11). 
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According to a basic inequality, 
a* + b* 
9) 9 
equality holds only when a = b. Applying this to y = a, x = b, we get 


ab < 


A, = [yids < ~["(y3 + £7) ds. (3) 
0 2/0 
Since s is arclength, x* + y? = 1; so we can rewrite (3) as 
A, < sf +1-y?)ds. (3’) 
Since y = 0 at s = O and 7, we can factor y as 
y(s) =u(s)sin s, (4) 
u bounded and differentiable. Differentiate (4): 


y=usins +ucoss. 

Setting this into (3’) gives 
1 7 e e e 
A, < 5/ [u?(sin? s — cos? s) — 2uusins cos s — u*sin?s +1] ds. (5) 
0 
The product 2uu is the derivative of u*; integrating by parts changes (5) into 
A, <~["(—wsin?s) d 

<= — u- sin’ s) ds, 
clearly < 77/2. Equality holds only if «7 = 0, which makes y(s) = constant sin s. 
Since equality in (3) holds only if y =x = y1 — y”, y(s) = +sins, x(s) = Fcoss 
+ constant. This is a semicircle. Q.e.d. 
Courant Institute of Mathematical Sciences 
New York University 
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A Note on Entire Solutions 
of the Eiconal Equation 


Dmitry Khavinson 


The eiconal equation 7_,(du/0x,)* = 1, u: R” > R is one of the main equations 
of geometrical optics. Its characteristics represent the light rays, while the level 
surfaces of solution u can be thought of as wave fronts (cf., e.g., [3]). Here, 
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however, we are interested in the complex-analytic solutions of that equation. 
Consider a holomorphic solution u of the eiconal equation in two variables 


u;t+uz=1, ((x,y) €C’). (1) 


The purpose of this note is to prove the following: 


Theorem 1. Let u be an entire solution of equation (1) in C?. Then u is a linear 
function, i.e., u = ax + by + c, where a,b,c € C, and a* + b* = 1. 


Note. The theorem fails in all dimensions > 3. For example, consider in C’ the 
function u(x, y,z) =x + f(y + iz), where f is an arbitrary entire function of one 
variable. Then, u is obviously nonlinear and satisfies the eiconal equation 


2 2 2 
uy, tuyt+u;=1 


everywhere in C?. 


Proof of the theorem. Changing variables to z = x + iy, w =x — iy, we reduce the 


problem to showing that an entire solution u(z, w) satisfying 
u,'u,=1 (2) 


is linear. Since u, # 0 in C”, u, = e’, while u,, = e~7, where y(z, w) is entire. We 
have 


Or 


Yu, + y,u, = 0. (3) 


Consider a level surface y, = {(z,w): y(z,w) = c}, c € C. By (3), on y, 


Me _ Me _ or 


= — =—€@ 


Yw Uy 


const. (4) 


But the left-hand side in (4) is the derivative of the implicit function f(z) defined 
by 


¥(z,f(Z)) = C. 


Thus, f’(z) = e** = const, and so y, = {w = f(z)} is a complex line with the 
“slope” e*°. Moreover, Ye, Yc, = O whenever c, # c,. Otherwise, y, which is an 
entire function, would be taking two different values at the point of intersection of 
Ye, and Y-,, Hence, all “slopes” of the y,’s must be the same, i.e., e*” = const, and 
SO, y = const, and 


u, =e” = const, u, =e * =const. 


Thus, u = Az + Bw+ C, with AB = 1 as claimed. 
Remarks 
(i) Peculiar cylinders T = {(x, y, z): x + f(y + iz) = const}, in C? that prevent 


extending the theorem to higher dimensions were first noted by G. Johnsson [4] in 
connection with the Cauchy problem for the Laplace equation (also, cf. [6]). 
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(ii) Professor Tabachnikov has kindly pointed out to the author, that Theorem 1 
does extend to higher dimensions under an additional assumption that the restric- 
tion of the function u to R” is real-valued. In that case, the level surfaces of 
function u represent the fronts of light rays moving along normals to a given 
surface, say I’) := {x € R”: u(x) = 0}, with a constant speed. The eiconal equation 
then implies that neither of those level surfaces have singular points, i.e., all 
normals to I, do not contain any finite focal points. Hence, I, has zero curvature 
(cf., e.g., [9, Section 6]). So, all level surfaces of u are parallel planes, and then, 
e.g., by solving explicitly the initial value problem for the eiconal equation, one 
readily verifies that u must be a linear function. 

(iii) Differentiating (2) with respect to z and w once more, we obtain, after 
some straightforward algebraic manipulations, that u satisfies 

U,,U,, — U2, = 0. (5) 
(5) is a degenerate Monge-Ampére equation. There is an enormous literature 
dedicated to the study of such non-degenerate equations, i.e., with a non-vanishing 
right-hand side, e.g., cf., [3], [7], and the references cited there. Some of the results 
stemming from a celebrated theorem of S. Bernstein [1, 2, 5, 8, 10, 11, 12], that a 
minimal surface over a whole plane must be a plane, seem to be very close in 
flavor to Thm. 1. For example, a theorem of Jérgens [5] states that a C?-solution in 
R’ of (5) with the right-hand side equal to 1 must be a quadratic polynomial. (For 
another proof of that based on ideas from [7], see [10].) Perhaps, one could revise 
some of those arguments to include the degenerate case (5), and then in view of 
(2), obtain another high-ground proof of Thm. 1. 


For this last remark, and the references, I am indebted to Professor H. S. Shapiro. 
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The Uniqueness Aspect 
of the Fundamental Theorem of Finite 
Abelian Groups 


David B. Surowski 


We shall use additive notation for abelian groups. The following is well-known to 
every graduate student of mathematics: 


Fundamental theorem of finite Abelian groups. Let A be a finite abelian group. 
Then there exist cyclic subgroups Z,,Z,,...,Z, of orders m,,m,,...,m, > 1, 
respectively, satisfying m,|m,,m,|m,,...,m,|m,_, such that 

A=Z,02,0°::: ®Z,. 


Furthermore, the integers rand m,,...,m, are uniquely determined. 


Note that the above theorem involves two parts: an existence part and a 
uniqueness part. While there are many short papers that provide novel proofs of 
the existence aspect, the uniqueness aspect has been largely neglected. Indeed, an 
alarming number of textbook treatments of the “Fundamental Theorem” do not 
even mention uniqueness as part of the theorem. Those treatments that do address 
uniqueness all, in varying degrees, obtain the uniqueness along the lines of the 
argument as given in Mac Lane and Birkoff’s standard text [2], or by an analysis of 
the i-rowed minors of the “relations” matrix defining A. (Compare [1; Theorem 
3.9]; this amounts to a proof of the uniqueness, up to associates of the “Smith 
Canonical Form” of the relations matrix defining A.) There may be some who, on 
the grounds of “purity,” might object to arguments akin to those in [2], as they 
invoke the uniqueness of dimension of a vector space. While this is hardly a serious 
objection, our argument below is quite independent of even this simple result. 

Thus, assume that we have decompositions of the finite abelian group A into 
direct sums of cyclic groups: 

Z,02,0°::@®Z2,=A=2,02,0°:' OZ, 
where |Z;| = m,, |Z;| = m',i = 1,...,r, j = 1,..., 5, and that the above divisibil- 
ity conditions on the orders m,,m'; hold. Note first of all, that if we set m = 
exp(.A) = least positive integer m such that ma = 0 for all a € A, then m, =m, 
= m. We call m the exponent of A. Thus, we have decompositions of the form 
ZOeB=A=Z' OB, 

where Z,Z' are cyclic of order m. The idea is to prove that there exists an 
automorphism : A — A _ such that ¢(Z) = Z’, for then it would follow that 
B = B’, and the desired uniqueness would follow by induction. 

We hasten to concede a small logical glitch in the above induction. Indeed, we 
have started with a fixed abelian group A with two direct sum decompositions; 
after application of the above isomorphism we obtain two isomorphic, but not 
identical groups, viz., B and B’. However, this is not a serious problem and the 
student should have no difficulty in enunciating an induction hypothesis suffi- 
ciently general to be applicable here. 
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What is really going on is summarized in the following: 


Theorem. The group Aut(A) of automorphisms of A acts transitively on the cyclic 
subgroups of order m = exp(A) of A. 


In other words, given any two cyclic subgroups C,C’ < A, both of order m, then 
there exists an automorphism %: A > A with W(C) = C’. In fact, we'll prove a 
slightly stronger result, namely that Aut(A) actually acts transitively on the 
elements of order m in A. 

To prove this result, it suffices to prove that if p is a prime and if p* is the 
highest power of p that divides m, then Aut(A) acts transitively on the elements 
of order p* in A. It therefore suffices to consider the case in which the exponent 
m is itself a prime power: m = p*. 

Thus we have a decomposition A = Z, © Z, ®::: ® Z,, where Z,; is cyclic of 
order p*, and where k, =k >k,°-:: >k,. 

The following two lemmas are very easy, but fundamental. 


Lemma 1. Let B = B, ® B, be an abelian group and let »,: B, > B, p,: B, > B 
be injective homomorphisms. If ,(B,) 0 (B,) = 0, then the mapping uw: B > B 
defined by p(b, + b,) = w,(b,) + w,(d,), b, € B,, by € B, is an automorphism 
of B. 


Lemma 2. Let A = Z, ® Z, © ::: ® Z, be as above, and assume that Z,; = (z;) is 
cyclic of order p*',i = 1,...,r. Assume thatk =k, =k, = +++ =k, kj), <k. If 


a= )\a,z,;€A, 
i=1 
then a has order p* if and only if p t a,, for some j,1 <j < |. 


Proof: Simply note that because Z, © Z, ® --- ® Z, a is direct sum, we conclude 
that the order of L7_,a,;z,; 1S equal to the least common multiple of the individual 
orders o(a;z;), i= 1,...,r. Since p + a,, we see that o(a,;z;) = o(z,) = p*. 

Thus, let a € A be an arbitrary element of order p*. We shall show that there 
exists an automorphism ¢: A — A such that (z,) = a. If we write 


, 
a= » Q;2;, 
i=1 
we may assume that p t+ a,. Indeed, if p t a;, then an automorphism of A that 
interchanges Z, and Z; will reduce us to this situation. Next, according to Lemma 
1, write A = Z, ® B,, where B, = Z, + +--+ +Z, and define injections w,: Z, > 
A, M2: B, >A by setting 4,(z,) = a, w, = 1g,. Therefore the map uw: A > A, 
given by u(w, + b,) = uw ,(w,) + w(b,) defines an automorphism of A which 
carries z, to a. 
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UNSOLVED PROBLEMS 


Edited by: Richard Guy & Richard Nowakowski 


In this department the MONTHLY presents easily stated unsolved problems dealing 
with notions ordinarily encountered in undergraduate mathematics. Each problem 
should be accompanied by relevant references (if any are known to the author) and by 


a brief description of known partial or related results. Typescripts should be sent to 
Richard Guy, Department of Mathematics & Statistics, The University of Calgary, 
Alberta, Canada T2N 1N4. 


Coin-Weighing Problems 


Richard K. Guy and Richard J. Nowakowski 


The question of finding a single counterfeit coin from a set of regular coins in the 
fewest number of weighings using just a balance beam has been a notorious 
problem. The regular coins are all the same weight while the counterfeit coin is a 
different weight. 

The problem was popular on both sides of the Atlantic during World War II 
((14, 15, 20, 21, 27, 34, 39, 46]; it was even suggested that it should be dropped over 
Germany in an attempt to sabotage their war effort; see [35, 40, 43] for some 
history. In solving [39] Kaplansky, Neugebauer and Pennell gave the following 
general solution to the problem of underweight counterfeit coins: If 3”"~' < N < 3” 
then n weighings suffice to show if there is (and to identify) a counterfeit coin among 
N coins. If it is known that a counterfeit coin exists then n weighings will identify 
the coin from among N coins if 3”~! < N < 3”. Dyson [12] gave an elegant 
solution using ternary labels when it is not known if the counterfeit coin is heavy or 
light; see [11] for a solution in verse. In this case, n weighings suffice 

(i) if N < @” — 3)/2 and it is required to find if the dud is heavy or light; 

(ii) if N < (3” — 1)/2, given an extra coin known to be good, and it is required 
to find if the dud is heavy or light; and 

Gii) N < 3" + 1)/2 if there is a good coin but the relative weight of the 
counterfeit coin is not required. 

In the solution to the general problem posed in [15], the editors note that all the 
solutions so far consider the coins to be distinguishable when in the balance pan. 
They show that if the coins in a scale pan are to be considered as a single set, then 
n weighings will find a coin amongst N < (7 x 3"~? — 1)/2. 

There are many other variants [1, 7, 9, 32]. Forysthe, a responder to [14], seems 
to be the first to ask the question using a spring balance i.e. a weighing device that 
will return the exact weight; see also [33, 41]. Christen [8] asks the question for two 
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counterfeit coins but of complementary weights. Hwang [25] proposes and analyses 
many weighing schemes. 

Shapiro’s problem [41] assumes N coins, N — 1 of weight a and one of weight b 
where a and b are known, and, as with Forsythe, an accurate scale. He asks for the 
least number of weighings to determine which coin has weight b, where the 
weighing scheme must be given in advance. S6derberg and Shapiro [44] ask the 
more general question of how many weighings are needed to determine which of 
N coins are of weight a and which of weight b if the numbers of each are not 
known. Denote this number of weighings by f(N) then they show that (i) 
f(N) = N/log,(N + 1); Gi) fB"-'B + m)) < 3”; Gii) fG6"~ (2m + 5)) < 5”; 
and that (iv) f(N) = O(N /In N). In addition Erdés and Rényi [13] (and several 
others independently) show that 

| N InIn N | 
+ O| ———— 


(InN)? 


f(N) = log, N 


Cantor and Mills [6] and Lindstré6m [28, 29, 30] give explicit weighing schemes for 
N = 2*—'k (also see [1]). The result of Liu [31] is not as good as this. 

Another variant is that of deciding which coins are counterfeit out of N coins 
but the number of weighings is fixed. The “Lower Slobbovian Counterfeiters” [17, 
4, 24] and ApSimon’s Mints problem [2, 23] are examples. 

Some years ago Sir Alexander Oppenheim reminded us of yet another variant. 
It was perhaps first stated by Bellman and Gluss [3]: use a beam balance to find k 
counterfeit coins among WN coins where all the counterfeit coins are of the same 
weight. Let w(N,k) be the least number of weighings required to find the k 


lighter coins. It is easy to see that w(N, k) is at least log,( ) Pyber [36] showed 


that if there were no more than m light coins then they could be identified in 
jloz,(”)] + 15m weighings. 

The case k = 2 has recently attracted attention [5, 25, 45]. ToSi¢ gave weighing 
procedures which improved on those of Cairns and showed that the lower bound 
could be attained apart from one possible extra weighing. For example, with seven 
coins weigh 123 against 456. If they balance, weigh 1 against 2 and 4 against 5. If 
123 are heavier, then 4567 contain two light coins which can be determined in two 
more weighings. In the special case w(N = 3”, 2) = 2m, the extra weighing is 
never needed. 

In which cases is w(N,2) = [log,(% |] + 1? Is N = 13 the first? 


If there are three lighter coins, then there is no new problem until we get to 
w(6, 3) = 3 which was the subject of a problem in [10]. Oppenheim showed that 
w(7, 3) = w(8, 3) = 4: first weigh three coins against three. Nine coins require 5 
weighings. Can the lighter coins be identified in five weighings if N = 12? 3” coins 
can be sorted in 3m weighings; this can be improved by at most one weighing; 
when? 

If k = 4 we know that w(8, 4) = w(9, 4) = 5; although (| = 70 < 3% it is not 
possible to make a weighing among 8 coins with 4 light each of whose outcomes 
leave less than 26 possibilities and while 26 < 3° they cannot be separated by three 
weighings. We can show that w@”, 4) < 4m — 1. 

If k = 5, then w(10,5) = 6 and we can show that w@”,5) < 5m, but this can 
almost certainly be improved. 

The problem of Sdderberg and Shapiro, but using a beam balance in place of a 
spring balance is: given N coins which each weigh one of two weights, determine 
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the least number, W(N), of weighings required to find which coins are of each 
weight. Of course, if all the coins are of the same weight, we won’t be able to say 
which of the two possible weights they are. We see that W(1) = 0, W(2) = 1, 
W(3) = 2, W(4) = 3, W(5) = 4, W(6) = 4 and generally W(N) = flog, 2%]. For 
which WN is there equality? 

Notice that we don’t require that the whole weighing scheme be given in 
advance, as has been done in the more elegant solutions of the famous 12 coin 
problem. The subsequent weighings depend on the results of the previous ones. 
We could also ask for the minimum number of weighings, if these are all to be 
prescribed before the first weighing is made. 
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Bridges would not be safer if only people who knew 
the proper definition of a real number were allowed to 
design them. 
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PROBLEMS AND SOLUTIONS 


Edited by: 
Richard T. Bumby, Fred Kochman and Douglas B. West 


Proposed problems should be sent to the MONTHLY PROBLEMS address given on 
the inside front cover. Please include solutions, relevant references, etc. Three copies 
are requested. 


Solutions of published problems should arrive before July 31, 1995 at the MONTHLY 
PROBLEMS address given on the inside front cover. Solutions should be typed with 
double spacing, including the problem number and the solver’s name and mailing 
address. Two copies suffice. A self-addressed postcard or label should be included if 
an acknowledgement ts desired. 


An asterisk (*) after the number of a problem, or part of a problem, indicates that no 
solution is currently available. Partial solutions will be useful in such cases. Otherwise, 
the published solution is likely to be based on a solution which 1s complete and 
correct. Of course, an elegant partial solution or a method leading to a more general 
result is always useful and welcome. In addition, references to other appearances 
of MONTHLY problems or to solutions of these problems in the literature are also 
solicited. 


PROBLEMS 


10431. Proposed by Yury J. Ionin, Central Michigan University, Mt. Pleasant, MI. 


For positive integers n and s withn > s, the falling factorial (n)s is defined as wey 


Let d(n, s) denote the greatest common divisor of the falling factorials (x); and (n + s)s. 
Prove that d(n, s) | (2s — 1)| 45/3). 


10432. Proposed by David M. Bloom, Brooklyn College, CUNY, Brooklyn, NY. 


Let 
P= {p e Z* : pis prime and p =3 (mod 4)}. 
For p € P, let S(p) denote the sum of all quadratic residues (mod p) that lie in the interval 
(0, p/2), and let R(p) denote the least positive residue of S(p) (mod p). 


(a) Prove that R is one-to-one. 
(b) Show that there are infinitely many positive integers that are not in the range of R. 
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10433. Proposed by Daniel R. L. Brown (student), Kenneth R. Davidson, and Jeffrey Shallit, 
University of Waterloo, Waterloo, Ontario, Canada. 


Let x1, x2, x3,... be any sequence of positive real numbers, and let k be any positive 
integer. 
(a) Show that 
. Xy+x2+°+--+%x 
lim sup Ay 42 TT Ant > 4. 
n—&o Xn 


(b) More generally, show that 


; xy tx2+-+- + Ante — (+1)! 
lim sup ————____—___ > ——__. 


no Xn kk 


(c) Show that these bounds are best possible. 


10434. Proposed by Daniel Goffinet, Saint Etienne, France. 


Let P be the set of nonconstant periodic mappings from R to R, endowed with the 
topology derived from the supremum norm. Find the components of P. 


10435. Proposed by Jonathan L. King, University of Florida, Gainesville, FL. 


Let the function K (x, y) be nonnegative and continuous on 0 < x < 1,0 < y < 1. 
Suppose that there are functions f(x), g(x), each positive and continuous for 0 < x < 1, 
such that 


1 1 
[ F(y)K (x, y) dy = g(x) and [ 8(y)K (x, y)dy = f(x). 
Is it true that f(x) = g(x) forO0 < x < 1? 


10436. Proposed by Donald A. Darling, Newport Beach, CA. 


Let the unit interval (0, 1) be divided at random into two subintervals. That is, form the 
intervals L; = (0, X) and Ry = (X, 1) where X is a random variable uniformly distributed 
in (0, 1). The interval L, is similarly divided, independently of the first division, into L2 and 
R2, and the process of dividing the leftmost subinterval is continued indefinitely yielding 
two sequences {L;, L2,...} and {R 1, Ro,...} of intervals. Let the length of an interval / be 
denoted by |/|. Find the distribution of S = )~ |L,|, i. e., the function F(x) = Pr {S < x}. 


10437. Proposed by J. Maurice Rojas (student), University of California, Berkeley, CA, and 
A T & T Bell Laboratories, Naperville, IL. 


Let R be a ring (whose multiplication is not necessarily commutative or associative) 
without zero divisors. Let x1,...,X, be algebraically independent indeterminates over R 
which commute and associate amongst themselves and commute with the elements of R. 
Also assume the associative law for products of one element of R and two x;. Prove the 
following. 

(a) If f € R[x1,...,X,] is homogeneous, then any divisor of f is homogeneous. 

(b) Ifa 1, ..., a, are nonzero elements of RK and d), ..., d, are nonnegative integers with 
gcd(d,,...,d,) = 1, then the polynomial 


d 
yxy) tee + On xn 


is irreducible in R[x1, ..., Xn], 1. €., every factorization has at most one nonconstant factor. 
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NOTES 


(10432) A number a is a quadratic residue modulo p if p { a and a is congruent (mod 
p) to the square of an integer. Thus, for p = 7, the quadratic residues are 1, 2, —3, so 
R(7) = S(7) = 3. For p = 11, the quadratic residues are 1, —2, 3, 4,5. Thus $(11) = 13 
and R(11) = 2. (10435) If K (x, y) is required to be positive, the conclusion is true. This 
positive result was problem B-4 of the 1993 Putnam Competition. (10436) It is easily shown 
that the related expression )°|R,,| is equal to 1 with probability one. 


SOLUTIONS 


A Trigonometric Matrix Norm 


E 3473 [1991,956]. Proposed by Lawrence J. Wallen, University of Hawaii at Manoa, Hon- 
olulu. 


Suppose 0 < 6; < 02 <--- < @ < z. Let A be the n by n matrix whose entry in the 
ith row and jth column is sin |6; — 0;|. Show that 


|All < cot(z/2n) 


and that the estimate is best possible. Here || A|| is defined as sup| A X |, where the supremum 
is taken over all column vectors X in R” with Euclidean norm 1. 


Solution I by the proposer. Since A has nonnegative entries with positive off-diagonal 
elements, A” has positive entries (except for n = 2). Perron-Frobenius Theory assures us 
that if € is the maximum modulus of the eigenvalues for A, then é is an eigenvalue of A 
with an eigenvector having strictly positive entries A;. Since A is symmetric, € = ||A||. 

Let P be a 2n-gon whose opposite sides o; and oj+, are parallel and of equal length i; 
(i = 1,...,n), and let edge o;, if extended, make angle 6; with the x-axis. 


Claim. Any such polygon P can be tesselated with (5) parallelograms whose edges are trans- 
lates of the pairso;,0; for! <i <j <n. 


Proof. The result is clearly true for n = 2. This will be the basis of our induction. (The 
case n = 1 may also be considered to be vacuously true.) 

Delete edges o,, and o2,, and translate one of the two broken half-perimeters by o, to 
meet the other. This forms a 2(n — 1)-gon P’ that also has opposite sides parallel and of equal 
length. We may assume by induction that P’ has the desired tesselation property. Also, the 
region swept out by the moving half-perimeter is tesselated by parallelograms whose edges 
are translates of the pairs (oj, 0,) for 1 < i < n. Putting these two observations together 
gives the required tesselation of P. 

This dissection of P shows that 


Area(P) = > AjA; sin(6; — 6;) = - > aidj sin |6; — 6;| = =( 07). 
ij 


i<j 
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Now, the isoperimetric quotient (area / perimeter”) of a convex m-gon is greatest for the 
regular m-gon, and this value is (1/4m) cot(z/m) (see I. M. Yaglom & V. G. Boltyansky, 
Convex Figures, Holt, Rinehart and Winston, 1961). Hence we have, by Cauchy’s inequality 


£(24) = @La)ewr tou (Z) < en "(@Dal)on(Z), 


giving the required bound. 


If OQ, = ka/n,k =0,...,n — 1, then it is easy to compute that all row sums of A are 
equal to 
nat (=) 
> sin | —— }| = cot (—) 
k=1 n 
Hence cot(z/2n) is an eigenvalue with (1,1,..., 1)? as an eigenvector. Thus, the upper 


bound is attained. 


Solution IT by Anchorage Math Solutions Group, University of Alaska, Anchorage, AK. An 
example showing that the upper bound is attained was given in Solution I. It need not be 
repeated. 


Write a; for |6;+1 — 9;|. Then A is the real part of an Hermitian matrix H = —iC, where 
0 ei e! (a1 +42) . ella +...+4n—1) 
ena 0 ei 42 — el (a2+...+dn—1) 
C= —e—!(41+42) —e!a2 0 — e! (43+...+an—1) 
—eWt(ait...-+4n—-1) —e7H(42t...+4n—1) — et (a3t...+an—1) . 0 


We begin by transforming C so that its eigenvalues can be identified. First, let D = 
diag(1, e/@, ef(ita2), |, ef (4it--+4n-1)). Then, C; = DCD! has 0 on the diagonal, +1 
everywhere above the diagonal and —1 everywhere below the diagonal. Since C and C, are 
similar, they have the same eigenvalues. Now, C; = f(T) where f(x) = (x” —x)/(x —1) 
and T = (ti, ;) with ¢;i41 = 1(@ = 1,...,2 — 1), 4.1 = —1, and all other entries zero. 
Since 7” = —I, the eigenvalues of any f(7) are f(71), f (72), --- f(7n), where the 7; are 
the roots of z” = —1. Such matrices are known as skew-circulants (see P. J. Davis, Circulant 
Matrices, Wiley, 1979, sect. 3.2.1, pp. 83-84). 

It follows that the eigenvalues of C are of the form i cot ((2k —1)z/2n) for k = 
1,2,...,, and the eigenvalue of largest absolute value arises from k = 1, correspond- 
ing to the largest eigenvalue of H. The Rayleigh-Ritz theorem (see Roger A. Horn & 
Charles R. Johnson, Matrix Analysis, Cambridge, 1985, theorem 4.2.2, p. 176) gives 
cot(z/2n) = sup,z9 (Hz, z)/{zZ, z). Separate real and imaginary parts as H = A+iB 
and z= x+y. Then, Restriction to vectors with y = 0 shows that this is an upper bound 
on the largest eigenvalue of A. As in solution I, this gives the desired bound on ||A|l. 


No other solutions were received. 
Coupled Squares 


10213 [1992, 361]. Proposed by PG. Walsh, University of Waterloo, Waterloo, Ontario, 
Canada. 


Suppose x and y are positive integers such that x + xy and y + xy are both squares. 
(a) Prove that exactly one of x and y is a square. 


(b) Characterize all such pairs of integers x, y. 


Solution by Robin J. Chapman, University of Exeter, Exeter, U. K.. Let N denote the set of 
positive integers. 
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(a) We first note that x and y cannot both be squares. For if x is a square, then y + 1 
is also a square (since x(y + 1) is a square), making it impossible, as y € N, for y to be a 
Square. 

Suppose now that x, y € N and x(y + 1) and y(x + 1) are both squares. Write x = ac* 
and x + 1 = bd? where a and b are both squarefree. Then ac’(y + 1) and bd’y are both 
Squares, and so y+ 1 = au* and y = bv” for some u, v € N. Hence 


1 = au? — bv* and — 1 = ac? — bd?. (1) 
It follows that (auc + bvd)* — ab(cv + ud)* = (au* — bv*)(ac* — bd”) = —1 and so the 
negative Pell equation X* — abY* = —1 has integer solutions. Note that ab is squarefree 


as a and b are coprime, and that ab £ 1 as x(x + 1) is not a square. Now by the theory of 
Pell’s equation, if r* — abs* = 1 withr,s © N then (r +. s\/ab) = (p + qVab)* where 
p and q are integers with p* — abq* = +1. Apply this with r = 2x + 1 and s = 2cd. 
Writing (r + sVab) = (p + qVab)* with p* — abq? = +1 gives 2x +1 = p* +abq’. 
Combining these two conditions on p and q, it follows that either p* = x + 1 or p* = x. 
In the latter case x is a square. In the former, x + 1 is a square and as y(x + 1) is square, it 
follows that y is a square. 

(b) Suppose by symmetry that x = s* is a square. Then y + 1 = u7? is also a square. 
Now (x + 1)y = (s* + 1)(u* — 1) isa square and if k is the greatest common divisor of 
s* + 1 and u? — 1 then 


2 


s?+1=kt? andu? —1=kv? (2) 


for some ft, v € N. Thus, s and ¢ give a solution of the negative Pell equation s* — kt? = —1. 
By standard theory there are either zero or infinitely many such solutions for each k € N. 
Conversely, given such a k, one can find positive integers s, t, u and v satisfying (2). Then 
x = s* and y = kv’ will have the required propoerty. 


Editorial comment. Dennis R. Estes observed that (1) gives rise to an equivalence between 
six ambiguous forms of discriminant ab. The fact that a = 1 or b = 1 follows from the 
work of Gauss showing that there are at most four ambiguous forms in the same class. 
Connections with this theory will be found in any solution. 

Several readers invoked the theory of Continued Fractions to give a more detailed char- 
acterization of those k € N for which the negative Pell equation is solvable. 

Richard Stong took a different approach, which we sketch here. Eliminate y from the 
equations x + xy = m* and y + xy =n? to obtain x(1 + x) +. xn* = (1+ x)m?. Fixing 
x in this equation and looking at solutions (m, n), one finds that (m’, n’) is also a solution 
with m’ = (2x + 1)m — 2xn and n’ = (2x + 1)n — 2(x + 1)m (also using the theory of 
the Pell equation), and y’ can be found so that x + xy’ = m’ 2 and y’+xy’ =n’. An 
easy analysis shows that 0 < m’ < mifn > m > 0. This reduction, combined with 
the operation of interchanging x and y (with the corresponding interchange of m and n) 
allows every solution to be linked to a degenerate solution in which x = 0. Retracing one’s 
steps allows the general solutions to be created out of the degenerate solution. The required 
property is proved by showing that it is preserved by these steps. 


Solved also by J. Anglesio (France), D. R. Estes, N. J. Fine, I. Kastanas, K. S. Kedlaya (student), O. P. Lossers (The 
Netherlands), L. E. Mattics, W. W. Meyer, J. P. Robertson & J. B. Robertson, R. Stong, M. Vowe (Switzerland), and 
the proposer. Six incorrect or incomplete solutions were received. 


A Maximum on the Boundary 


10222 [1992, 462]. Proposed by Gerry Myerson, Macquarie University, North Ryde, NSW, 
Australia. 


(a) Let h be a strictly increasing convex function on [0, 1]. Let n be a positive integer. 
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Assume that 0 < aj <...<a, < 1landO < x1 <...< x, < 1. Prove that 


y—h( xj — a;|) < max (Sra yond — «») . 
j=1 j=l j=l 


(b) Let n be a positive integer and let aj = (27 — 1)/2n for 1 < j <n. Assume that 
O<x1...< x, < 1. Leth bea strictly increasing, but not necessarily convex, function on 


[0, 1]. Prove that 
n n 
Yo h( |x; —aj|) < y= h(aj). 
jJ=1 jJ=1 


Solution by Robin J. Chapman, University of Exeter, Exeter, U. K.. (a) We use induction 
onn. If n = 1 then either x; < a; and so h( |x4 — a|) = h(a, — x1) < h(a), or x; > ay 
and so h( |x; — a1|) = h(x1 — a1) < h(1 — a1). Assume now that n > 1. 

If x; < a, then h( x7 —a4| ) = h(a, — x1) < h(a) and the inductive hypothesis gives 
either i= h( |x; — a;| ) < i= h(a;) or i= h( |xj — a;j| ) < h(ai)+ doi h(i—a;). 
In the latter case if ay < 1/2 then ay < 1 — a, implying that h(a,) < h(1 — a) and so 
the result follows. On the other hand if aj > 1/2 then for 7 > 1, a; > a, > 1/2 and 
1 — a; <a;. Hence h(a;) + i=2 h(l—aj) < ia h(a;) and again the result follows. 

A similar argument works if x, > apy. 

Now if x; > a, and x, < a, then there exists k with ax < xp < Xx41 < Qx41. Hence 
h( xx — axl) + h( x41 — ae4i]) = he — ag) + (agi — xk41). Now as h is convex 
then if b,c > Oandb+c < 1, then 
ch(0)+ bh(b+c) bh(O)+ch(b+c) 


h(b) + h(c) < 
(6) +h(c) s bic bic 


=h(0)+h(b+c). 
It follows that 
h( |xx — axl) + A( \xk41 — ae41]) < ACO) + hanya — ax) 
< min (h(ax) + h(an41), ACA — ax) + ACA — ap4s)) 


If n = 2, we are done; else,by induction 


Y> h(x; -—ajl) <max{ So AG), D> AC -4)) 
j#KK+I tk, k+1 j#KK+I 
and the result is now immediate. 

(b) Let J; = [G7 — 1)/n, j/n) for 1 < 7 < nand J, = [(n — 1)/n,1)]. Then if 
xj € I for each j, then 1 <7 < 72 < +--+ < ry <n. Also ifs; = |r; — j| +1 then 
|xj — aj| < (2s; — 1)/2n = as. I claim that there is a permutation 7 € S, such that 
sj < (J) for all 7. Given this claim it follows that 


YA(lxj — ajl) < Do AG@s,) < Do har) = Doh), 
j=l jal j=l jal 


as required. 

It only remains to prove the claim. We again use induction on n. If n = 1 the claim is 
trivial. Suppose then that n > 1. We first suppose that r, < n. Then if 7 < n — 1 then 
rj < n—1 and by induction there exists w € S,_; with s; < m(j) forl <j <n—1. 
Also s, < |1 —n| +1 =n and so putting m(n) = n we get the required permutation. 
Now suppose that r, = n. If j <n putr’ = 7; ifr; <nandr; =n —1lifrj =n. As 
l<rf<rn<:--:<7r_, < 2-1 then by induction there is a permutation p € Sy—1 
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with Ir; — j| +1 < p(y) forl < j <n. Hences; = |r; — j| +1 < p(y) +1. Now 
Sn = |n —n| +1 = 1 and soif we put r(j) = p(y) + 1 for j < n and x(n) = 1 then 7 is 
the required permutation. 

Note that in neither (a) nor (b) do we need strict monotonicity of h. 


Editorial comment. The proposer used the assumption of strict inequality to guarantee 
that the sum would have a maximum attained at a single point. Most solvers succumbed to 
the temptation to simplify the argument by characterizing the point (x1, ...,X,) at which 
the maximum is attained. 


Solved also by M. V. Bjelica (Yugoslavia), M. Bowron, R. B. Israel (Canada), M. Mocsy (Hungary), K. Schilling, 
and the proposer. 


An Integral Infinite Sum 


10231 [1992, 570]. Proposed by Adrian Riskin, Northern Arizona University, Flagstaff, AZ. 


For positive integers m and n, let 
OO 
fim n) = okay. 
k=1 


(a) Prove that f(m, 1) is an integer. 
(b) Show that the last digit of the decimal expansion of f (1, 7) can only be 0, 2, or 6. 


Solution to part (a) by David Beckwith, Sag Harbor, NY. Let 
OO 
gn(x) = f(x,n) = Doky. 
k=1 


The series converges uniformly on a positive interval; termwise differentiation yields 


ore) 
/ _ k-1 ] __ 8n 1(x) 
&,(x) = Sk" k (5) (x+12 Oct) ’ 
k=1 


Hence gn41(x) = x(x + 1)g', (x). By explicit computation, g9(x) = x. By the recurrence, 
every g,(x) is a polynomial in x with integer coefficients. Hence every f (m,n) = g,(m) 
is an integer. 


Solution to part (b) by Richard Holzsager, American University, Washington, DC. Starting 


with f(m, 1) = g1(m) = m* + m and applying the recurrence, we obtain the additional 
polynomials 


f(m, 2) = 2m? + 3m? + m, 

f(m, 3) = 6m* + 12m? + 7m? + mM, 

f(m, 4) = 24m? + 60m* + 50m? + 15m? + m, 

f (m, 5) = 120m® + 360m> + 390m* + 180m? + 31m? + m. 


Reducing the coefficients modulo 10, we have f(m,5) = f(m,1) mod 10. Hence the 
succeeding polynomials repeat mod 10 with a period of 4, for any fixed m. For m = 1, the 
cycle is 2, 6, 6, 0. 


Editorial comment. A popular method of solution was to show that 


f(m,n) = (m + 1) Yo kIS(n, k)m*, 


k=1 
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where S(n, k) denotes the Stirling numbers of the second kind. This establishes (a), and 
f(,n) =2 an k!S(n, k) mod 10 leads to (b). 

Gerry Meyerson noted that f(m, n) is an integer multiple of m(m + 1) forn > 0. This 
also follows from the selected proof. He also located f(1,)/2 as sequence #1191 in 
N. J. A. Sloane, Handbook of Integer Sequences, Academic Press, 1973, where the sequence 
is traced back to Cayley. The number f(1,)/2 is equal to the number of distributions 
of n distinct objects into ordered cells such that no occupied cell is above an unoccupied 
cell. A proof of part (b) using this interpretation can be found in O. A. Gross, “Preferential 
arrangements”, this MONTHLY 69 (1962), 4-8. This latter reference was mentioned by Istvan 
Nemes. 

William Y. C. Chen gave further references dealing with the question of periodicity of 
the f (1, 2) modulo primes. 


Solved by 54 solvers and the proposer. 


Mutually Convergent Series 


10291 [1993, 290]. Proposed by Howard Morris, Chatsworth, CA. 


Let k be a positive integer and let (x,,) be a nondecreasing sequence of real numbers for 
which }-(1/xn) converges. Show that 5~(In x,)*/x, converges if and only if )~(Inn)* /xp 
converges. 


Solution by Frank Schmidt, Arlington, VA. Since xn < xXn41 and >> (1/xn) < oo, we have 
limn—>0o(n/Xn) = 0 (see Editorial Comment below). In particular, there is a real number K 
withn < Kx, forall n, so that >. (log Xn)* /Xn < CO implies > (log n)¥ /xn < 00. To prove 
the converse, split the indices into two subsets: (I) x, < nk+2. (II) xp > n*+2_ On subset I, 
log xn < (kK + 2)(logn), hence } > (log n)k /xXn < © implies } |, (log Xn)‘ /Xn < 00. As for 
subset (II), for sufficiently large n (depending on k), we have 


k 
(logxn)* — (Xn) FT 1 1 
—————- < = < . 
x x 1 k+2 
n n (X,) #41 nk+i 


Since (k + 2)/(k +1) > 1, )0y og xn)*/xn < 00. 


Editorial comment. The upper bound on x, may be obtained in many ways: most readers 
gave a simple ad hoc proof; three readers appeared to treat it as obvious; and three readers 
refered to a well-known result (theorems of Abel, Kroneker or Pringsheim) without citing 
the statement that the solver had in mind. An editor supplied another approach: paraphrasing 
theorem 3.27 (attributed to Cauchy) of Walter Rudin, Principles of Mathematical Analysis, 
3rd edition, McGraw-Hill, 1976, we find that given conditions imply )~ 2“ /x.« converges, 
from which the result follows. 

H.-J. Seiffert observed that the upper bound on x, is also part of the general theory of 
convergence exponent i of (x,). For x, as in this problem, 4 is characterized by }° x, ° 
converging foro > i and diverging foro < A, so that XA < 1 in this case. In G. Pélya 
and G. Szegé, Problems and Theorems in Analysis, Vol. Il, Springer-Verlag, 1972-76, pp. 


25-26, entry 113, one finds that A = lim sup 72" 
n—> CW 


log Xn 


. Although this leads to slightly weaker 
inequalities than have been mentioned above, it is strong enough for the present needs. 


Solved also by V. Bozin (student, Yugoslavia), D. A. Darling, E. Hertz, R. Holzsager, G. L. Isaacs, I. Kastanas, A. D. 
Melas (Greece), A. Pedersen (Denmark), H.-J. Seiffert (Germany), R. Stong, A. A. Tarabay (Lebanon), R. B. Tucker, 
H. V. Vu (student, Hungary), A. N. ’t Woord (the Netherlands), and the proposer. 
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A Spigot Algorithm for the Digits of tr 


Stanley Rabinowitz and Stan Wagon 


It is remarkable that the algorithm illustrated in Table 1, which uses no floating- 
point arithmetic, produces the digits of 7. The algorithm starts with some 2s, in 
columns headed by the fractions shown. Each entry is multiplied by 10. Then, 
starting from the right, the entries are reduced modulo den, where the head of the 
column is num /den, producing a quotient gq and remainder r. The remainder is 
left in place and g X num is carried one column left. This reduce-and-carry is 
continued all the way left. The tens digit of the leftmost result is the next digit of 
a. The process continues with the multiplication of the remainders by 10, the 
reductions modulo the denominators, and the augmented carrying. 


TABLE 1. The workings of an algorithm that produces digits of 7. The dashed line indicates the key 
step: starting from the right, entries are reduced modulo the denominator of the column head 
(25, 23, 21,...,resp.), with the quotients, after multiplication by the numerator (12, 11, 10,...), carried 
left. For example, the 20 in the 5's column yields a remainder of 1 and a left carry of 1 - 9 = 9. After 
the leftmost carries, the tens digits are 3,1,4,1. To get more digits of 7 one must start with a longer 
String of 2s. 


Initialize 2 2 
x 10 20 20 20 20 20 720 20 20 20 20 20 
Carry 10) hd 2 2 dO. 2 petty EB Poy aos CON nO = 
30 |S, A323, \'32, 27, \28.~ yK 20, 20. 420 , 
Remainders ‘43 N34 yg Sat 43/ y2? 2? “99 99) ‘294 
x 10 40 30 100 10 10 200 200 200 
Carry +40 +65 +48 +98 +72 +150 +132 £96: 
80 95 148 108 192 160 332 ‘2964200 . 
Remainders 3 5 5 4 5 8 17 20 “ 
<10 30 50 50 40 50 80 170 200 
Carry +40 +40 +42 +63 +64 +90+120 +88 +0 
70 90 92 103 144 140 200 258 200 
Remainders 0 O 4 12 9 4 10 6 16 
<10 0 0 40 120 90 40 100 60 160 
Carry +9 +24 +55 +84 +63 +48 +72 +60 +66 +0 
9 24 55 124 183 138 112 160 126 160 


This algorithm is a “spigot” algorithm: it pumps out digits one at a time and 
does not use the digits after they are computed. Moreover, the digits are generated 
without any use of high-precision (or low-precision) operations on floating-point 
real numbers; the entire algorithm uses only ordinary integer arithmetic on 
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relatively small integers. For example, to obtain the first 5,000 digits of 7 requires 
only arithmetic operations on integers less than 600,000,000. Although high- 
precision floating-point routines are built up from integer operations, the algo- 
rithms in this paper are quite simple and do not simulate floating-point computa- 
tions. 

In order to motivate the z-algorithm, we first discuss the much simpler case of 
e, for which a spigot algorithm was discovered by Sale [Sale]. His algorithm is the 
basis of the discussion in §1. 


1. A NUMBER SYSTEM IN WHICH e’s DIGITS ARE PERIODIC. A real num- 
ber’s decimal representation may be interpreted as an infinitely nested expression; 
for example: 


V2 = 1.41421356... = 1+ 79(4+ i(1 + (4+ io(2 + w+ ---))))). 


Some interesting and useful representations may be obtained if we change the 
base-sequence, which in the case above is (=, +, i, 79 ---). For example, using the 
base b = (5,4, 4, =, ---) yields the following form, called a mixed-radix represen- 


tation (see [Knu, §4.1]): 


dy + 2(4, + 3(42 + 3(43 + 3(a4 + G(a5 +---))))), 


where the a, (the digits) are nonnegative integers. If 0 < a, <i for i> 1, the 
representation is called regular. Mixed-radix representations will be denoted by 
(ap3 41, 45, 43, 44,...),. For base b, every positive real number has a regular 
representation and representations are unique provided we exclude representa- 
tions that terminate with maximal digits (otherwise, for example, += 
(0; 1,0,0,...), = (0; 0, 2, 3, 4,5, 6,...),); from now on and for all bases, we exclude 
such representations. The proof of the following Lemma is in Appendix 1. 


Lemma l(a). Jf i = 1, (0;0,0,...,0, @;, a;,,,.--),» < 43 in particular, 
(0; @,, 5,43, 44,...), < 1. 

(b). Representations using the mixed-radix base b are unique. 

(c). The integer part of (do; a1, 45, 43, 44,...), iS Ay and the fractional part is 
(0; 1, 45, 43, A4,..-)p- 


In this number system some irrationals become periodic. For example, e = 
(2;1,1,1,1,...),; this is just a restatement of the infinite series Ls as 
1+ 40+ 50 + 30 + 40 + 201 +...))))). Rational numbers in this system cor- 
respond to digit-sequences that terminate (Appendix 1, Lemma 2). 

The decimal digits of a real number x in [0,10) can be obtained by taking the 
integer part of x, multiplying its fractional part by 10, taking the integer part of the 
result, multiply the resulting fractional part by 10, and so on. In some mixed-radix 
bases, this is especially simple. If x = (a); @,,a@5,...,4a,),, then 10x = 
(10a); 10a,,10a,,10a;,...,10a,),. The latter may not be a regular expression: 
some digits may be too big. But we can decrease digits by reducing them modulo 7, 
where i is the denominator of the corresponding element of b. Starting these 
reductions at the right end, we carry the quotients left, eventually getting the 
regular representation of 10x. Thus multiplying by 10 is algorithmically straightfor- 
ward. Taking the integer and fractional parts for b-representations is also easy, 
thanks to Lemma 1(c). 
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We can now give the algorithm to get the first n base-10 digits of e. A proof of 
correctness—the error analysis showing that n + 2 mixed-radix digits suffice! to 
get n base-10 digits—is given as Lemma 3 in Appendix 1. 


Algorithm e-spigot 


1. Initialize: Let the first digit be 2 and initialize an array A of length n + 1 to 
(1,1,1,...,D. 
2. Repeat n — 1 times: 
Multiply by 10: Multiply each entry of A by 10. 
Take the fractional part: Starting from the right, reduce the ith entry of A 
modulo z: + 1, carrying the quotient one place left. 
Output the next digit: The final quotient is the next digit of e. 


The first few steps of this algorithm, starting with an array of 10 1s (this 
corresponds to 11 mixed-radix digits, good for 9 digits of e; only 5 are shown), are 
displayed in Table 2. 


TABLE 2. The workings of a spigot algorithm for the digits of e (in bold). The reductions in the 
column headed + are performed modulo i. The leftmost base-10 real numbers are the values of the 
rows viewed as mixed-radix representations. Since only 11 mixed-radix digits start the algorithm, the 
first base-10 number is only an approximation to e. 


Base 10 


2.718281826... 
7.18281826... 
cairies 


+ — 
ene 
— 


0.18281826... 
1.8281826... 
carries 


boobdl 
Wit UW 
bo OW 


0.8281826... 
8.281826... 
carries 


$2 lbo S us wo 


0.281826... 
2.81826... 
carries 


dO 
Ow oo 


obkbSuukbood| 


Ks 


l 
10 
43 
14 
0 
0 
43 
3 

l 
10 
| +6 
16 
0 
0 
+5 
5 

l 


colbbooSbSuuboo SKS 


No 
wn 


0.81826... 


2. A SPIGOT FOR DIGITS OF ‘TT. The ideas of §1 lead to a spigot algorithm for 
a, but there are additional complexities and additional interesting questions that 
distinguish 7 from e. Our starting point is the following moderately well-known 


“Any digit-producing algorithm for a presumed-normal number x suffers from a drawback that, 
although unlikely, can impinge on the result. If x is between 1 and 10 and the algorithm says that the 
first 100 digits of x are, say, 4,6,5,0,7,...,3,9,9,9, 9,9 then one cannot be sure that the last 6 digits 
are correct. They will be the digits of a certain approximation to x that is within 5 - 107! of the true 
value. One cannot simply go farther until a non-9 is reached, because memory allocations must be made 
in advance. The user must realize that a terminating string of 9s is a red flag concerning those digits 
and even with no 9s, the last digit might be incorrect. In practice, one might ask for, say, 6 extra digits, 
reducing the odds of this problem to one in a million. 


1995] A SPIGOT ALGORITHM FOR THE DIGITS OF 7rF 197 


series: 
(i!)*2'*! 
jo (21+ 1)! 
This series can be derived from the Wallis product for 77; another approach uses 
an acceleration technique called Euler’s transform applied to the series 7 = 4 — 
++2-—44+.... These proofs, together with three others and references to 


earlier sources, may be found in [Li]. We let k!! denote the product 1:3-5-::k 
for odd integers k; then the series is equivalent to 


oo i! 1 1:2 1:2:°3 
-Loarpr tata tpg t 
which expands to become 
qT J 2 3 4 
Fare sfre cre s(iscas-)])] 


This last expression leads to the mixed-radix base ec = (4, =, 3, ¢,...), with respect 


to which v7 is simply (2;2,2,2,2,2,...).. For a regular representation in base c, 
the digit in the ith place must lie in the interval [0, 27]. Unfortunately, base c is less 
accommodating than b. 


Lemma 4 (Proof in Appendix 1). The base-c number with maximal digits, 
(0; 2, 4, 6, 8,...), represents 2; hence regular representations of the form 
(0; a,b,c,...), lie between 0 and 2. 


Lemma 4 implies that c-representations are not unique. For example, 
(0; 0, 4, 6,8,...), = 2 -— £= 4, whence (0;0,2,3,4,...). = $= (0;2,0,0,0,...),. 
More relevant algorithmically, integer and fractional parts using c are not straight- 
forward, as they are for b. The integer part of (a); a,,a,...), is either ay or 
a, + 1 according as (0; a,, a5,...) is in [0, 1) or [1, 2). This problem is surmounted 
by leaving the units digit of a, in place during the next iteration and calling the 
tens digit of a, a predigit. The predigits must be temporarily held because 
occasionally (once every 20 iterations, roughly) the next predigit is a 10; this will 
happen when the carry, which is between 0 and 19, is greater than 10 and, 
simultaneously, the leftover units digit of a, is 9, which becomes 90 in the 
multiply-by-10 step. This event requires that the held number be increased by 1 
before being released. Specific details of the algorithm follow; the presentation at 
the beginning of this paper sidestepped the problem of the occasional 10. The 
proof that [10/3] mixed-radix digits suffice for n digits of 7 is in Appendix 1 
(Lemma 5). Appendix 2 contains a Pascal implementation of this algorithm. 


Algorithm 7r-spigot 


1. Initialize: Let A = (2,2,2,2,...,2) be an array of length [10n /3]. 
2. Repeat n times: 

Multiply by 10: Multiply each entry of A by 10. 

Put A into regular form: Starting from the right, reduce the ith element of A 
(corresponding to c-entry (i — 1)/(2i — 1)) modulo 2i — 1, to get a 
quotient g and a remainder r. Leave r in place and carry g(i — 1) one 
place left. The last integer carried (from the position where i — 1 = 2) 
may be as large as 19. 
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Get the next predigit: Reduce the leftmost entry of A (which is at most 
109[= 9 - 10 + 19]) modulo 10. The quotient, g, is the new predigit of 
ar, the remainder staying in place. 

Adjust the predigits: If q is neither 9 nor 10, release the held predigits as true 
digits of 7 and hold gq. If qg is 9, add q to the queue of held predigits. If 
q is 10 then: 

e set the current predigit to 0 and hold it; 
e increase all other held predigits by 1 (9 becomes 0); 
e release as true digits of 7 all but the current held predigit. 


This algorithm uses only integer arithmetic and is easy to program. The table at 
the beginning of the paper shows it in action, starting with 13 mixed-radix digits of 
a (good for 4 base-10 digits). To clarify the working of the algorithm, note that the 
(finite) first row of Table 1 is a mixed-radix representation of 3.1414796..., 
the second row represents 31.414796..., the fifth row represents 1.414796..., the 
sixth row is 14.14796..., the ninth row is 4.14796..., and so on. Table 3 shows 
the result of a computation using a larger initial array; the holding aspect does not 
become relevant until the 32nd digit. 


TABLE 3. The actual digits of 7 (bottom) compared to the sequence of leftmost base-e digits for 35 
iterations with a starting array of 116 2s (good for 35 digits). At the 32nd iteration a 102 shows up, 
yielding a predigit of 10. 


30 13 41 15 58 92 26 64 53 35 58 89 97 78 92 32 23 38 84 45 62 26 63 42 33 38 82 32 27 78 94 49 102 28 87 


31415926535897932384626433832795 028 


We repeat that the algorithm uses only integer operations. To get 5,000 digits of 
qm requires only integer arithmetic on numbers less than 600,000,000. The algo- 
rithm leads naturally to the question of improving it to one that is essentially as 
simple as e-spigot. 


Question. Is there a base d of rationals such that 7 has a d-representation that is 
periodic, or an arithmetic progression, and such that a, is always the integer part 
of (93 a1, 4z,--- Ja? 


Gosper [Gos, p. 32] has discovered a series for 7 that brings us tantalizingly 
close to spigot-perfection: 


1 1 2-3 162-3 3:5 
7=3+ —8 + ————13 + — ——___ —____18 + 
60 607°8°3 607-8-3 10-11-3 
1 62:3 3:5 4-7 
34-5 


60 7-8-3 10-11-3 13° 14-3 


He obtained this series by using a refinement of the Euler transform on 4 — 2 + 
#—24+.... Gosper’s series leads to the base d = (4, 7, 3, &,..-), with 
respect to which aw is (3; 8,13,18,...). A computation shows that 
(0; 59, 167, 329, 545,...), = 1.092..., a substantial improvement over the 2 that 


arose for c. Under the usual randomness assumption for 7r’s digits, the odds of a 
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bad predigit in base c are 1 in 20, while in base d they decrease to less than 1 in 
110; this is because a d-predigit of 10 occurs only when the remainder is a 9 (which 
becomes 90) and the carry is a 10. The former happens 10% of the time, while the 
latter happens no more than once in 11 iterations because the carry is the integer 
part of a real between 0 and 10.93. So base d is within 1% of spigot-perfection. 
Because Gosper’s series converges more quickly than the one we used, it has less 
memory requirements: n digits of 7 require an initial array of length n; however, 
the arithmetic on the array will involve integers larger than those in an array of the 
same size using base c. 

One way to improve the Gosper-series approach is to reduce the fractions in d 
to lowest terms. Then the regular number with maximal digits is (0; 59, 27, 21, 
38,...)Ja, which equals 1.0000476468.... It is not hard to see that the regular 
representation of 7 is unchanged in this new base. However, the work expended 
in reducing to lowest terms outweighs the gain made in reducing the number of 
times a 10 appears as a predigit. Thus it is likely that an affirmative answer to the 
question above is of more theoretical than practical interest. 

The spigot algorithm for 7 is by no means competitive with the recently 
discovered fast algorithms (due to the Borwein brothers, the Chudnovsky brothers, 
and others) that have been used to compute hundreds of millions of digits of 7 
(see [BBB]). But the spigot algorithm does have the advantage of avoiding all 
floating-point computations; thus it is easily implemented on a home computer 
where it can produce thousands of digits in a few minutes. Moreover, it gives the 
result directly in base 10 (most other 7r-algorithms produce the result in binary or 
some internal format and a second pass must be made to obtain decimal digits). 

The algorithm given here can be made to run faster by outputting multiple 
digits at a time. For example, to get five decimal digits at a time, simply compute 
the digits of 7 using base 100,000. This can be done by multiplying by 100,000 
instead of 10 in the main step. The integer part is then the next “digit” in base 
100,000. 

If one is working in base 100,000 and knows in advance that the portion of digits 
to be computed does not contain thé string 00000, then one can omit the lengthy 
part of the algorithm that adjusts the predigits. This can lead to an exceedingly 
short computer program. For example, Rabinowitz [Rab] used this idea to exhibit a 
14-line Fortran program that outputs 1,000 decimal digits of 7. 

Finally, we mention that the algorithm can be parallelized, in which case it 
becomes blindingly fast up to about 10,000 digits. 

For examples of spigot algorithms for other functions, see [Abd]. 


APPENDIX 1. FIVE LEMMAS 
Lemma l(a). [fi > 1,(0;0,0,...,0, @;, 4;41,.-.» < +3 in particular, (0; a,, a5, 43, 
Ay... dp <1. 

(b). Representations using mixed-radix base b are unique. 

(c). The integer part of (ao; a1, 4,43, 44,..-), IS Gy and the fractional part is 
(0; 4,, €5, 43, 44,..-)p. 


Proof: (a). It suffices to prove that L?_;, Ck — 1)/k!= 1/i!, which follows from 
the fact that*the series telescopes to: 


1 1 
(i+1)! (i+2)! 


1 1 


"lGsa! +3)! 


+ 


1 1 
r (+1)! 
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(b). Suppose (ay; @,, 45, 43, 44,...), and (Cp; C, C5, C3,C4,..-), Tepresent the 
same real number. Then, for some i, 0 = L_;d,/k!, where |d,| <k and d, # 0. 
But then |d,| /i! < L%_;,,|d,|/k!, contradicting (a). 

(c). This follows from (a). 


Lemma 2. A positive number is rational iff its digits using the mixed-radix base b are 
eventually 0. 


Proof: The reverse direction is obvious. For the forward direction we use a 
sublemma. 


Sublemma. For any integers t and n, with 0 <n< t!, there are integers d, in (0, i] 
such that n = d,t(t — 10t — 2)::: 4-34 d,t(t — Dt -— 2)---5-44 
oan +d,_3t(t —_ 1) + d,—yt + d,—1- 


Proof: By induction on ¢. If n < t! write n as qt +r withO <r<tand0<q< 
(t — 1)!. By induction there is a sequence (d,,d,,...d,_3,d,_,) that is a solution 
for g with respect to terms (¢ — 1)(t — 2)---4-3, and the like, whence 
(d,,d,,...d,_3,d,_5,r) is a solution for n w.r.t. the terms t(t — I(t — 2)--- 4- 3, 
and the like. 


Returning to Lemma 2’s proof, suppose a positive rational s/t is given. Use the 
sublemma to express s(t— 1)! in the form d,t(t — I(t — 2)---4-3+d,t 
(¢— 1Mt —2)---5-44+--: +d,_,t¢ — 1) + d,_,t + d,_,. Dividing by ft! then 
yields a representation of s/t as a sum of reciprocals of factorials with appropri- 
ately small coefficients, which is the same as a terminating representation in the 
mixed-radix base b. 


Lemma 3. The algorithm for digits of e is correct. 


Proof: It must be shown that n + 2 mixed-radix digits of e suffice to get n base-10 
digits of e. We first prove that if n > 28 (=[10el), then n mixed-radix digits 
suffice for n base-10 digits. Using n mixed-radix digits means we are actually 
getting the base-10 digits of e, = (2;1,1,1,...,1) = L7_)91/i!. Thus we must show 
that 

e—e, <5-10~” (see footnote at beginning of paper). A geometric series estima- 
tion of the tail of the series shows that e — e, < 2/(n + 1)!, and then Stirling’s 


formula yields 
2 1 e\" 1 ) 
————_ <— < - | <{—]. 
(n+ 1)! n! (- [5 


If n < 28 then a direct computation of the digits shows that n + 2 mixed-radix 
digits suffice. 


Lemma 4. The base-c number with maximal digits, (0; 2,4, 6,8,...), represents 2; 
hence regular representations of the form (0; a, b,c,...). lie between 0 and 2. 


Proof: Instead of giving a formal proof, we show how some Mathematica computa- 
tions led to the result (and a proof). In terms of series, the lemma states that 


2 (2i)i! 
& (2i+,1)! 
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A rough calculation showed that the sum is near 2. Then a rational computation of 
the remainders—the differences between the partial sums and 2—yielded the 
following sequence. 


4 4 16 16 32 32 256 256 512 512 


The pattern in these remainders was found by dividing each by the preceding one, 
which yielded: 


5? 79 99 119 139 159 179 199 21° 


Induction proves the pattern to be valid in general; it follows that the remainders 
have the closed form 2”*'/ (>"; ‘), which converges to 0, as claimed. 


Lemma 5. The algorithm for digits of a is correct. 


Proof: As for e, we look at 7 — 7,,, where 7,,, = (2;2,2,...,2).. This error is the 
tail of our main series for 7: L7_,,(i!)?2'*'/(2i + 1)!. This tail is less than twice its 
first term since each subsequent term is less than half its predecessor, leading us to 
study m!*2"*?/(2m + 1)!. Splitting the denominator into evens and odds turns 
this into: m! 27/(@ -5---(2m + 1)), which is less than 4m! 2*/(2 - 4---(2m)), or 
1/(3 - 27!) It is easy to see (using the fact that + < log,, 2) that this last is less 
than 5-107” when m = [10n/3], as claimed. 


APPENDIX 2. PASCAL CODE 

The following program, for which we are grateful to Macalester student Simeon 
Simeonov, implements the algorithm 77-spigot. This code makes use of the fact that 
the queue of predigits always has a pile of 9s to the right of its leftmost member, 
and so only this leftmost predigit and the number of 9s need be remembered. The 
program computes 1000 digits of 7 and requires a version of Pascal with a longint 
data type (32-bit integer). 


Program Pi_Spigot; 


const n = 1000; 
Len =10*n div 3; 
var i, J, k, gq, X, nines, predigit : integer; 
a : arrayl1..lenJ of lLongint; 
begin 
for j := 1 to len do aljJ := 2; {Start with 2s} 
nines := QO; predigit := 0 {First predigit is a 0} 
for j := 1 to n do 
begin q := 0; 
for i := len downto 1 do {Work backwards} 
begin 
x := 10*alLil+aq*i; 
aCLiJ := x mod (2%*1-1); 
q := x div (2*i-1); 
end; 
aL1] := q mod 10; q := q div 10; 
if q = 9 then nines := nines + 1 


else if q=10 then 
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begin write(predigit+1); 


for k := 1 to nines do write(Q); {zeros} 
predigit := 0; nines := Q 
end 


else begin 
write€predigit); predigit := q; 
if nines <> 0 then 
begin 
for k := 1 to nines do write(9); 
nines := 0 
end 
end 
end; 
writeln(predigit); 
end. 


ADDED IN Proor. The latest version of Mathematica (2.3) can sum many of the series that occur in 
this paper. It takes only a second or so to get 7/2 as the sum of the crucial series at the beginning of 
section 2, to get 1/i! for the series in Lemma 1’s proof, and to get 2 as the sum of the series in Lemma 
4’s proof. 
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The Rise, Fall, and Possible 
Transfiguration of Triangle Geometry: 
A Mini-history 


Philip J. Davis 


For Deborah Tepper Haimo 
In Friendship 


“Es ist in der That bewundernswuerdig, dass eine so einfache Figur, wie das Dreieck, so 
unerschoepflich an Eigenschaften ist. Wie viele noch unbekannte Eigenschaften anderer Figuren 
mag es nicht geben.” A.L. Crelle (1780-1855), Sammlung, v. I, 1821, p. 176. 

[It is indeed wonderful that so simple a figure as the triangle is so inexhaustible in its 
properties. How many as yet unknown properties of other figures may there not be?] 


1. INTRODUCTION. In the great Encyklopaedie der Mathematischen Wissen- 
schaften, put out under the general editorship of Felix Klein, there will be found a 
hundred page article, completed in the Fall of 1914, on contemporary triangle 
geometry. (G. Berkhan and W. Fr. Meyer: Neuere Dreiecksgeometrie, Vol. II] AB 
10, pp. 1173-1276.) On finding this article, the reader’s eyebrows may be elevated: 
what on earth is triangle geometry? If the reader then goes to the index of the 
Mathematical Reviews for enlightenment, he will not find the term triangle 
geometry among the hundred or so subsets into which its coverage has been 
partitioned. Differential geometry, yes; convex geometry, yes; finite geometry, yes; 
triangle geometry, no. Yet, the Encyklopaedie devoted one of its major articles to 
this topic. F. Cajori, in his 1907 history of mathematics, devoted a half dozen pages 
to it. So what is going on here? The subsumption of this topic by another one? Or 
the essential death of a topic? 

What, in fact, is triangle geometry? According to Encyklopaedie authors, it is 
not easy to define the subject logically, but it seems to boil down to this: given an 
arbitrary triangle, certain points (and lines and curves) are then determined which 
have remarkable properties with respect to the triangle. Instances of such points 
are the incenter, the circumcenter, the orthocenter, and the center of gravity of the 
triangle. By way of reminder, the first three are, respectively, the intersection of 
the internal angle bisectors, the intersection of the perpendicular bisectors of the 
triangle sides, and the intersection of the three altitudes. These four points were 
studied in antiquity. 

In 1803, a mathematician by the name of Kluegel dubbed these points the four 
distinguished (or remarkable) points of the triangle (merkwuerdige Punkte). In the 
years that followed, a great many distinguished points, lines, circles, and conics, 
have been unearthed; so many, in fact, that Berkhan and Meyer despaired of 
counting them all. A point, line, circle, or conic, if sufficiently distinguished, 
merited a special name, and so we have, as some further examples, the Fermat 
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point, the Torricelli point, the Gergonne point, the Brocard points and circle, the 
Lemoine point and circle, the nine point circle, the Euler line, the symmedian 
point, the Steiner point, etc. etc. In Kimberling (“Central points...” ) will be 
found a listing of more than one hundred such distinguished objects. Not only 
special names were given, but also, quite understandably, histories of individual 
distinguished objects were written (e.g., Mackay wrote short histories of the 
symmedian point and of the nine-point circle. A recent book by Baptist presents 
many facts about the development of triangle geometry in the 19th century). 

Accordingly, Berkhan and Meyer proposed as a definition of triangle geometry, 
“the study of distinguished points, lines, circles and conics of a triangle’, leaving, 
as far as I can see, the definition of what is distinguished or remarkable about a 
point to one’s subjective judgment. 

A somewhat more sophisticated definition comes from Felix Klein himself (in 
his famous Erlanger Programm) and says that triangle geometry is the invariant 
theory of five points under the projective group. Perhaps this definition is less 
vague, but I don’t think that it catches the flavor of the subject as it has been 
pursued historically. 

Triangle geometry as a distinguished subfield of mathematics seems to have 
emerged in the 1870’s in the writings of E. Lemoine, and if one considers the field 
both forward and backward in time from that date, it will be found that many 
distinguished mathematicians have contributed a little something to it. I leave to 
the reader’s subjective judgment what constitutes a distinguished mathematician. 
Among the books that are wholly or partly devoted to this topic, one may cite 
Alasia (written under the encouragement of the famous geometer Eugenio Bel- 
trami, then the President of the Reale Accademia dei Lincei), and containing, 
among other things, 566 metric formulas relating to the triangle and its distin- 
guished points! Some other books are: Altschiller-Court, Casey, Coolidge, Em- 
merich, Johnson. In moments of euphoria, some of these authors viewed triangle 
geometry as the new and fulfilled Euclid, very much as the New Testament has 
been claimed as the fulfillment of the Old Testament. 

One of the classic results of triangle geometry is the nine point circle theorem, 
which goes back, in part, to Poncelet in 1820. This theorem asserts that, given a 
triangle, the following nine points are concyclic: (i.e., lie on one circle) the three 
side bisectors, the three altitude feet, and the three midpoints along the altitudes 
from the vertices to the orthocenter. This is only one of the remarkable properties 
that this circle has; for example, it is tangent to the inscribed circle and the three 
escribed circles of the triangle. When one comes across this theorem in geometry 
for the first time, there is a certain surprise associated with it. One gets the feeling: 
what wonderful coincidences! However, there is nothing that dissipates such a 
feeling more quickly than to see the geometric theorem reduced to an algebraic 
identity or to have it placed in a more general context. 

There is a great deal more that can be said about the nine-point circle. Some 
authorities have asserted that there are no fewer than 43 distinguished points lying 
on the nine-point circle. The nine-point circle theorem has generated a small 
mathematical industry. (See Gallatly). This, in itself, should now occasion no 
surprise, considering that a few well-chosen axioms such as those of group theory 
can generate a major mathematical industry. Moreover, it will give the reader an 
idea of the high regard accorded the nine-point circle to learn that some years ago, 
the distinguished analyst Dame Mary Cartwright told me that when she went up to 
Cambridge as a student (c. 1920), she was expected to know two different proofs of 
it. 
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One treatise on triangle geometry (Emmerich) presents the subject from the 
point of view of the Brocard points. I will refrain from giving definitions because 
the whole Brocard theory never sent me into raptures. But I shall mention one 
theorem that did; and when I was in high school, I cut my mathematical teeth on 
it. 

‘“‘Napoleon’s Theorem’: On the sides of an arbitrary triangle 7, erect three 
equilateral triangles outwardly. Then: 

(1) The three centers of the equilateral triangle are themselves the vertices of an 
equilateral triangle. (Napoleon’s Triangle.) 

(2) The three lines joining the vertices of the equilateral triangles to the 
opposite vertices of T are collinear. They meet in a point P known as the inner 
isogonic point of 7. That is, P is the unique potnt in T at which the sides of T 
subtend equal angles of 2pi/3. 

(3) These three line segments are of equal length. 

(4) Similar statements when the equilateral triangles are constructed inwardly 
on the initial triangle. 

(5) The inner and outer Napoleon Triangles have the same center. 

(6) The areas of the inner and outer Napoleon triangles differ by the area of the 
initial triangle. 

These are merely a few of the remarkable things that are associated with the 
Napoleon configuration or its generalizations. (See, e.g., Court, pp. 105-107, 
Sommerville, p. 165, Forder, p. 40, Hofstadter. And see Wetzel for a recent article 
containing new results and an extensive bibliography.) 

I think that these examples should give the reader a good feeling for what 
triangle geometry is all about. I refer to the Encyklopaedie article of Berkhan and 
Meyer for a number of more complicated developments and to older bibliographi- 
cal references. For additional recent references, see Kimberling. 

How were the theorems of triangle geometry discovered? The mathematical 
literature, in general, is not often forthright as to how its material emerges. I can 
only conjecture that as with much of mathematics, it emerged from long hours of 
“playing around’. Playing around synthetically, in coordinate free fashion, the way 
that Euclid is written up; but also playing around with algebra, trigonometry, and 
rectangular, oblique, homogeneous, barycentric, trilinear, complex, conjugate, pro- 
jective coordinates; all have been employed at one time or another. In textbooks 
such as Altschiller-Court, which are positioned as ‘‘advanced Euclid”, the synthetic 
approach is strong. 

But another sort of playing around undoubtedly took place. The figures of 
triangle geometry can be drawn relatively easily and fairly accurately with a ruler 
and compass. I conjecture that a number of theorems were discovered visually in 
this way. Accurate computer graphics are now available for this sort of playing 
around, or to give this old and important activity its current gentrified name: 
mathematical experimentation. 


2. TRIANGLE GEOMETRY BECOMES A MUSEUM PIECE. In a certain sense, 
the high regard accorded to triangle geometry culminated with the Encyklopaedie 
article. One of the two authors of the article (Berkhan) fell on the battlefield of 
World War I at the age of 32, his mathematical potential unrealized. As though 
prophetically, the subject itself hardly survived that war. 

In the USA, triangle geometry was known as advanced geometry or college 
geometry. Courses were offered wherever there was a faculty devotee. Textbooks 
were written (e.g., N. Altshiller-Court, R. A. Johnson). In Sommerville’s 1924 book 
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on conics, a UK/New Zealand text, many theorems of triangle geometry were 
“downgraded” to the position of exercises for the student. In more recent years, a 
number of the theorems of triangle geometry appear in Coxeter, but there is no 
attempt to categorize them as such. The role of triangle geometry in European 
mathematical education has been detailed by Baptist. 

Here is the 1940 judgment of Eric Temple Bell on the subject: 


“The geometers of the 20th Century have long since piously removed all these treasures to the 
museum of geometry where the dust of history quickly dimmed their luster.” (The Development 
of Mathematics, p. 323) 


Joseph Malkevitch, in a recent article that attempts to revive all kinds of geometry 
in the curriculum, lists fifty-eight subfields of geometry. Subfield No. 23, called 
Geometric Extremal Problems, lists the Fermat-Steiner Point. Other than that, 
there is nothing on triangle geometry on his list. Geometric interest, even when 
visual, turned elsewhere. 

Yet, the subject of triangle geometry and its generalization, polygon geometry, 
(I often use the phrase triangle geometry to include this generalization), was and is 
still a steady source of problems for the entertainment and enjoyment of problem 
buffs who read the American Mathematical Monthly, the Mathematical Gazette, 
Crux Mathematicorum and similar periodicals in this and other countries. Over the 
years, one man, V. Thebault, contributed a thousand problems in the area. 

Dozens of papers on these subjects have appeared—not just in the problem 
solving context—often displaying ingenious new approaches and new connections. 
Thus, Jesse Douglas presented a complex variable approach. I. J. Schoenberg 
exploited both complex variables and the discrete Fourier transform. Chang and 
Davis looked at Napoleon’s Theorem from the point of view of circulant matrices 
and the Moore-Penrose generalized matrix inverse. (Davis, 1977, 1979, Chang, 
Chang & Davis). Kimberling has examined triangle geometry from the point of 
view of functional equations; Baptist from the point of view of extremal problems. 

Other than the quiet and steady problem solving activity and the occasional new 
result, the subject was making no waves. Mathematicians by and large, might play 
with individual items as a relaxation; they might even derive intense satisfaction, 
but they probably would not have wanted their professional reputation to be 
judged by a contribution of this sort. 

“The song is ended, but the melody lingers on.”’ 


3. WHY DID TRIANGLE GEOMETRY DIE? What reasons can be given for the 
short life of triangle geometry as a strongly and coherently delineated corpus of 
results, sanctioned by the mathematical establishment? I can suggest a few. 
Though hardly as complex a phenomenon as the decline and fall of the Roman 
Empire, I will not assert that I have gotten to the heart of the matter. 

(1) The perception that the subject is part of elementary, “amateur”, or 
recreational mathematics and therefore is of low professional status. The subject is 
not “deep”. At the level of personal psychology, there was a feeling that even if a 
proof of a statement was not fairly transparent or immediately forthcoming, one 
could always “bulldoze” one’s way through a proof via analytic geometry. So why 
bother? 

To speak of the professional status of certain problems, one must deal with the 
relationship between the inner challenges of a field and the outer sociology of 
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mathematicians. The latter includes the reward structure of mathematical activity. 
Where one group of mathematicians may charge a second group with 
“amateurism’’, the second may counter with a charge of ‘elitism’. (See Fang and 
Takayama, Wilder, 1968, 1981). There is a similarity here with the field of music 
where the profession unselfconsciously divides its output into “classics’’, ‘‘light 
classics”, “‘popular’’, “highbrow”, “lowbrow”, and into many other status cate- 
gories. 

(2) The inner exhaustion of the interest and variety of its theorematic and 
methodologic possibilities. The Encyklopaedie article by Berkhan and Meyer 
presented no challenges or suggested directions for future development. No 
outstanding, long unsolved problems emerged to capture the imagination and 
challenge mathematical brilliance in the way that the famous Hilbert list of 
problems did. In a word, no really new ideas emerged from triangle geometry. 

In this connection, however, I should mention one idea that emerged briefly 
from triangle geometry and is analogous to a currently thriving field: computa- 
tional complexity. Called geometrography, it seems to have originated in a talk 
given by E. Lemoine at the Congress of the French Association for the Advance- 
ment of Science at Orano in 1888. The reader will find writeups in Alasia, pp. 
29-44, in Coolidge, Chap. III, in Lemoine, and in Mackay (1893 /4). 

The idea of geometography is as follows: beginning with a basic figure (often a 
triangle), construct a distinguished point or figure, often with ruler and compass, 
but also with other means, and then count the number of elementary operations 
required to do so. 

Alasia’s elementary constructions (operations) are five in number (1) R,: Place 
a ruler’s edge through a given point. (2) R, draw a straight line. (3) C,: Place one 
point of a compass on a given point. (4) C,: Place one point of a compass on an 
indeterminate point of a line. (5). C,: Draw a circle. Now count up how many of 
these operations are required to effectuate the required construction. Call the 
total number of operations the simplicity of the construction. 

The simplicity of many constructions will be found computed in the books 
alluded to. Here is one result: given the side of a regular pentagon, construct the 
circle in which the pentagon can be inscribed. The count given is 8R, + 4R, + 
11C, + 8C;, yielding a coefficient of simplicity of 31. 

Note again that the basic elementary operations are geometrical and not 
arithmetic. Even so, the coefficient of simplicity is strongly reminiscent of counting 
up the total number of floating point operations as is done in computer complexity 
theory. 

As far as I am able to determine, the notion of constructive simplicity went 
nowhere in its day. It died on the vine. 

(3) The increasing visual complexity and tediousness of the “deeper” results of 
triangle geometry, combined with 

(4) A view of geometry that had emerged by the end of the 19th Century and 
seriously downgraded the visual in favor of the algebraic /symbolic. 

(5) Susceptibility to the feeling of surprise has its ups and downs. (My goodness, 
do those three lines really intersect in one point? Who would have expected it?) 
But the professional is exposed to too many theorems and too many surprises. 
Surprise is accordingly dulled or attenuated and therefore devalued psychologically 
and can easily slip into boredom. 

(6) The reassignment or the migration of some of the content of triangle 
geometry to other traditional or newly emerging fields. As examples: the famous 
Desargues theorem about two triangles in perspective is seen now as part of 
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projective geometry. Other theorems are viewed as part of inversive geometry or 
algebraic geometry. 

(7) A dearth of connections or applications to other fields considered “live”; in 
particular, to areas in physics, etc. 

However, some counterexamples exist here. Bernhard Neumann, who has 
written on triangle geometry (B. H. Neumann, 1941), told me that his father, 
Richard Neumann, who was an electrical engineer, discovered Napoleon’s Theo- 
rem on his own and made use of it in the theory of three phase alternating current 
circuits. (R. Neumann, 1911, 1939. B. H. Neumann, 1982.) 

(8) Competition arising after World War II from many other geometrical 
constituencies, often with a strong visual component or with claims to wider 
applicability: e.g., convex geometry, tilings, symmetry and group theory, fractals, 
graph theory, computational geometry, etc. (See Malkevitch) 

In a word, the problem of status boils down to this: for the reasons just outlined, 
and perhaps others, none of the major mathematicians of the post World War I 
period considered triangle geometry to be of great importance. It would be 
interesting to position its change in status within the context of the “laws” of the 
evolution of mathematics proposed by Crowe and by Wilder. (Wilder, 1968, 1981.) 


4. ENTER: THE COMPUTER. It was clear, early on, that the computer offered 
the possibility of mathematical experimentation along visual, numerical, and sym- 
bolic/algebraic and logical lines; it offered the possibility of “mechanical” or 
“automatic” proof, and the possibility of the discovery or generation of new 
theorems. 

With the availability of fast computation and convenient, high level languages, it 
was inevitable that the strategies directed toward the above goals would be applied 
to one of the “easiest” of the computable and decidable mathematical theories: 
good old triangle geometry. 

The field of automated reasoning is currently extremely active, boasting of an 
enthusiastic corps of reseachers, international conferences, and a number of 
specialized journals. Toward this end 

*We may pursue a numerical road. 

Let us suppose that a certain specific geometrical configuration has been put 
forward. Certain points, lines, curves, have been specified by their specific coordi- 
nates and certain conclusions are to be reached. Suppose that the conclusion may 
be reached by generation of a finite sequence of intersections of the curves 
coupled with the interpolation of new curves to currently available data. Carrying 
out such a program numerically, we reach our conclusion or verify our theorem in 
the specific numerical case set up. We may be even in a position to do all this 
visually using computer graphics. 

In most instances, the numerical answers will be approximate; in favorable 
instances there will be single or multiple precision accuracy. By altering the 
numerical parameters, a family of results can be displayed rapidly, and on this 
basis certain distinguished phenomena, occurrences, (theorems) may be inferred by 
the investigator. This kind of thing goes on constantly in computer graphics, or in 
computer assisted geometric industrial design (CAGD). 

The investigator whose criterion of validity demands more than approximate 
numerical verification in one specific instance must employ other strategies. If the 
given initial configuration consists solely of lines specified as passing through 
points with rational coordinates, then all the computation may be (in principle) 
performed exactly in rational arithmetic. 
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There is a strategy available to overcome numerically the lack of generality in 
one special numerical case. It will be exhibited in the case of the famous Pappus 
theorem of projective geometry whose initial configuration is two arbitrary straight 
lines and three arbitrary points on each. 

If the coordinates of the points and lines are taken as algebraically independent 
real numbers, then the verification of the theorem in that one case serves to 
demonstrate the theorem generally. (Davis, 1977, Rowland and Davis, 1981, 1981, 
Schwartz, Hong and Tan). 

This leaves hanging in the air the question of what, digitally speaking, a set of 
algebraically independent numbers could possibly mean. Such a set would act as 
our symbols with respect to the computations involved, and hence the force of the 
above remark. 

One might instantiate a “pseudo-algebraically independent” set of numbers by 
taking them as random numbers, and interpreting the result probabilistically. One 
then arrives at the following principle: (valid within a certain limited context) if a 
theorem is true for one randomly selected set of initial configurations, it is true for 
all configurations. This puts at risk the old caveat of mathematics teachers (which 
can have a constipating effect on investigations): you must prove it in all cases, not 
just in one special case. At the same time, it offers the possibility in some instances 
of formalizing the inductive leaps that the mathematical mind takes when con- 
fronted with what seems to be, logically speaking, incomplete evidence. 

“We may go the road of heuristics, such as developed by George Polya in a 
series of popular books. We may try to combine Polya heuristics with strategies of 
Artificial Intelligence (AI). (Newell. R. Davis and D. Lenat.) 

*“We may go the road of logic. Tarski has proved that all statements in ‘Tarski 
geometry” are decidable. But it has been found that working with the two basic 
predicates given by Tarski, one for betweenness and one for distance, has not been 
a very promising approach. (Woos, p. 206-214, Hao Wang. For a system of 
“natural deduction” using the program AUTOMATH, see de Bruijn). 

*We may travel along the symbolic road, using computer packages such as 
FORMAC (now located in the Museum of Ancient Software!), MAPLE, MAC- 
SYMA, MATHEMATICA, and prove Pappus, in a naive and ad hoc way. (Davis 
and Cerutti) 

*We may travel a rather sophisticated algebraic road, a road traveled by Wu, 
Chou, Zhang, Goa, and others, a road that uses such algebraic ideas as Ritt’s 
Principle, or Groebner Bases. 

Briefly, and here I follow Chou, 1988, the strategy of Wu’s method is: 

Step 1: First convert the initial geometrical configuration into a set of polyno- 
mial equations. Convert the geometrical conclusion into a polynomial equation. 

The initial configuration (hypotheses) will be specified by 


h,(Uy,Ug,---,Ugs X15---,%,) = 0. 
h,(Uy,U5,...,Ug3X1,--+5X,) = 
h (Uy, Uz,.--5,Ug3X1,---,X;) = 


The conclusion is given by 


8(U,,Un,-.-,Ug3X1,---,X,) = 0. 
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In these equations, the u’s are independent variables, while the x’s are 
algebraically dependent on the w’s. 

Step 2. Using pseudo division and Ritt’s Principle, (or your own method), 
“triangulate” the polynomials. This means replacing the set of h’s by a set which 
introduces one new x at a time. Then check for irreducibility. 

Step 3. Successive pseudodivision to arrive at a final remainder R, after analysis 
of certain non-degenerate conditions. Hopefully, R = 0, indicating that the geo- 
metrical implication is true. 

To indicate the complexity of this approach in some specific examples, Chou 
reports that the proof of the so-called Thebault-Taylor theorem of elementary 
geometry (involving lines, circles, intersections and tangencies) required the ma- 
nipulation of polynomials of almost 700,000 terms. (In their naive approach to 
Pappus’ Theorem of projective geometry, Davis and, Cerutti report that the 
number of terms in the polynomials was almost 33,000.) 


5. GENERATION OF NEW THEOREMS USING THE COMPUTER. A number 
of approaches have been explored. 

*Playing around (i.e., mathematical experimentation) visually, numerically, sym- 
bolically. This is quite successful. For example, I have found many theorems 
(unpublished) in the area of group matrices simply by playing around with the 
MATLAB matrix package. Most experimenters can report similar experiences. 

In connection with visual output, I have even argued (Davis, 1974), for the 
recognition of the existence of “visual theorems’, i.e., stable visual patterns, 
generated by a computer algorithm, where what the eye “sees” need not even be 
verbalized, let alone formalized in traditional mathematical language. (As a 
parallel, philosopher Susanne Langer, in the context of music, speaks of the 
“subtle complexes of feeling that language cannot even name, let alone set forth’’.) 

*Programmed heuristics. This seems to be less promising. (See, e.e., Newell, 
1981) R. Davis and D. Lenat have written a program, AM (Automated Mathemati- 
cian) which starts from set theory, and proposes to invent new mathematical 
concepts and new conjectures relying on a library of built-in heuristics. 

Whatever the method employed, numerous new theorems have emerged, some 
of which have attracted attention, surprise and enthusiasm. (Hofstadter, 
Griinbaum and Shephard. In the last named reference, there are some philosophi- 
cal remarks on proof methodology paralleling those just made.) 


6. THE TRANSFIGURATION OF TRIANGLE GEOMETRY. Can a subject arise 
from the dust and ashes that history has piled on it? Only if it is transformed in the 
process. The focus of triangle geometry has now been changed. The computer has 
popped it up a metalevel, and in the process has transfigured the subject. 
Hundreds of elementary and not so elementary theorems that were in the litera- 
ture have now been proved by computer. Many new theorems have been discov- 
ered, again in a variety of ways. Triangle geometry always was a practice ground 
for strategies of proof in the spirit of Euclid, and it has now become a testing 
ground for strategies of decidability, proof, and theorem discovery. These strate- 
gies have run from naive schemes to the employment of deep and abstract results 
of modern algebra and differential algebra. 

But there is yet more that emerges from the change of focus: I believe that the 
experience gained in this change can become a prime source of raw material for 
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philosophical discussions on the nature of proof, methodologies of research, the 
role and nature of intuition, educational values, etc. 

What are some of the implications of this work? While I think it is too early to 
write with assurance, I will venture a few observations. 

*Obsessed over the millennia by the vision that mathematics can provide 
absolute, rock bottom ‘‘certainty’, the mathematical establishment has often 
expressed its displeasure with certain types of “proof”: visual, mechanical, experi- 
mental, probabilistic. This attitude goes back as far as Archimedes (200 B.C.), if 
not further. 

Computer proof, theorem discovery, and mathematical experimentation are now 
openly acknowledged as legitimate methodologies and roads to mathematical 
knowledge. 

Thus, absolutely rigorous mathematical proof, as an ideal, is giving way and is 
now seen as a part of a wider, more generous and more flexible notion that I like 
to call ““mathematical evidence”’. 

*Given that the output of the whole mathematical world, measured in terms of 
numbers of theorems, is of the order of one hundred thousand per annum, what 
use would the automatic generation of theorems in a restricted, well ploughed area 
serve? As A. L. Crelle correctly observed in the epigraph placed at the head of this 
article, the simplest mathematical structure can produce an unlimited number of 
conclusions. What, therefore, does one “do” with mathematical products that 
might be stamped out like doughnuts in a doughnut machine? 

*In this process, the individual theorem may stand devalued. For example, it 

may be vital in a certain application to know that the product 12563*502 = 6306626, 
but to the average mathematical mind, the theorem expressed by this multiplica- 
tive identity is quite tedious. 
Since this kind of theorem (of arithmetic, triangle geometry or whatever) can now 
be produced by the hundredfold, the emphasis inevitably changes from the 
theorem to the means by which the theorem is produced. By and large, the 
medium becomes the message. This is one of the lessons taught by the subsump- 
tion of geometry by algebra that occurred as a result of the revolutionary vision of 
Descartes. 

*An individual theorem may still be judged as to its importance, practical or 
otherwise, and may turn out to have such importance. This importance is deter- 
mined by subjective and historical criteria. 

The process whereby a mathematical concept, whether it be a simple point in a 
triangle or a whole complex theory, becomes ‘“‘distinguished’’, is not capable of 
formalization. (Woos sets up this problem as one of 33 basic research problems in 
automated reasoning.) It is a historical process and may involve the whole scientific 
community or significant subsets of that community. 

*The inner complexity of some proofs by computer, often involving polynomials 
of hundreds or thousands of terms, adds new respect and appreciation for the 
historic methods and traditional manner in which the results were presented. 

*The mysterious, omnipresent and vitally essential “mathematical intuition” 
together with its components of experience, analogy, educated guessing, and 
transcendental, non-explainable “‘pre-knowledge” (e.g., Ramanujan), all get raised 
a metalevel and now can operate in a wider arena. 

*As regards mathematical education, I think the message is clear. Classical 
proof must move over and share the educational stage and time with other means 
of arriving at mathematical evidence and knowledge. Mathematical textbooks must 
modify the often deadening rigidity of the Euclidean model of exposition. 
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7. CONCLUDING REMARKS. 


“The whole cultural world, in all it forms exists through tradition. These forms have arisen not 
merely causally... they have arisen within our human space through human activity.” —Edmund 
Husserl, The Origin of Geometry. 


The distinguished logician Hao Wang, as a result of his interest in proof by 
computer, was once charged with desiring to eliminate the mathematician. He 
answered “No, only the inferior ones’. I should like to interpret this remark more 
charitably: that the capabilities of all mathematicians are elevated by their associa- 
tion with computation. The transformation by the computer of triangle geometry, 
and of many other areas has, paradoxically, reconfirmed and strengthened the vital 
role of humans in the wonderful activity known as doing mathematics. Put it even 
more strongly: mathematics develops in such a way that the role of the mathemati- 
cians is always manifest. 
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Totally Real Origami and Impossible 
Paper Folding 


David Auckly and John Cleveland 


Origami is the ancient Japanese art of paper folding. It is possible to fold many 
intriguing geometrical shapes with paper [Ml]. In this article, the question we will 
answer is which shapes are possible to construct and which shapes are impossible 
to construct using origami. One of the most interesting things we discovered is that 
it is impossible to construct a cube with twice the volume of a given cube using 
Origami, just as it is impossible to do using a compass and straight edge. As an 
unexpected surprise, our algebraic characterization of origami is related to David 
Hilbert’s 17th problem. Hilbert’s problem is to show that any rational function 
which is always non-negative is a sum of squares of rational functions [B]. This 
problem was solved by Artin in 1926 [Ar]. We would like to thank John Tate for 
noticing the relationship between our present work and Hilbert’s 17th problem. 
This research is the result of a project in the Junior Fellows Program at The 
University of Texas. The Junior Fellows Program is a program in which a junior 
undergraduate strives to do original research under the guidance of a faculty 
mentor. 

The referee mentioned two references which the reader may find interesting. 
“Geometric Exercises in Paper Folding” addresses practical problems of paper 
folding [R]. Among many other things, Sundara Row gives constructions for the 
5-gon, the 17-gon, and duplicating a cube. His constructions, however, use more 
general folding techniques than the ones we consider here. Felix Klein cites Row’s 
work in his lectures on selected questions in elementary geometry [K]. 

In order to understand the rules of origami construction, we will first consider a 
sheet of everyday notebook paper. Our work with notebook paper will serve as an 
intuitive model for our definition of origami constructions in the Euclidean plane. 
There are four natural methods of folding a piece of paper. The methods will serve 
as the basis of the definition of an origami pair. 

We construct the line L,, by folding a crease between two different corners of 
the paper. Another line may be constructed by matching two corners. For example, 
if corners a and y are matched, the crease formed, L,, will be the perpendicular 
bisector of the segment ay. Another natural construction is matching one line to 
another line. For instance, By, the paper’s edge, and L, are lines. If we lay By 
upon L,, and form the crease, then we obtain L, which is the angle bisector of the 
two lines. If we start with two parallel lines in this third construction, then we will 
just get a parallel line half way in between. 

The fourth and final construction which seems natural is consecutive folding. 
This is similar to rolling up the sheet of paper only one does not roll it up, he folds 
it up. More explicitly, start with a piece of paper with two creases on it as in Figure 
2. Fold along line L, and do not unfold the piece of paper. Notice that line L, lies 
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Figure 2 


over the sheet of paper. With the paper still folded, fold the sheet of paper along 
the crease L, to obtain a new crease on the sheet underneath L,. If we name this 
new crease L, and unfold the sheet of paper, then it is easy to see that line L, is 
the mirror image or reflection of line L, about line L,. 

We now formalize these methods to define an origami pair on the plane. The 
creases on our sheet of paper are merely lines in the plane, and the corners of the 
paper are represented by points where lines (creases) meet. This previous discus- 
sion is the motivation for the following definition. 


Definition. (A, ) is an origami pair if F is a set of points in R* and ~# isa 
collection of lines in R? satisfying: 


i) The point of intersection of any two non-parallel lines in -# is a point in #. 
ii) Given any two distinct points in #, there is a line in .“ going through 
them. 


216 TOTALLY REAL ORIGAMI AND IMPOSSIBLE PAPER FOLDING [March 


iii) Given any two distinct points in #, the perpendicular bisector of the line 
segment with given end points is a line in _~. 

iv) If L, and L, are lines in /, then the line which is equidistant from L, and 
L, isin z&. 

v) If L, and L, are lines in , then there exists a line L; in _/ such that L, 
is the mirror reflection of L, about L,. 


For any subset of the plane containing at least two points, there is at most one 
collection of lines which will pair with it to become an origami pair. 


Definition. A subset of R*, P, is closed under origami constructions if there exists 
a collection of lines, -, such that (#H, .”) is an origami pair. 

The question which we answer in this paper 1s which points may be constructed 
from just two points, using only the origami constructions described above. We will 
call that collection of points the set of origami constructible points. 


Definition. A, = A{AI|(0, 0), (0,1) € A and F is closed under origami construc- 
tions} is the set of origami constructible points. 

Before we explain the structure of #), we give an example of an origami 
construction analogous to many compass and straight edge constructions, namely, 
the construction of parallel lines. 


Lemma. It is possible to construct a line parallel to a given line through any given 
point using origami. 


Proof: Refer to Figure 3. Given a line L and a point p, pick two points p, and p, 
on L. By property ii) in the definition of an origami pair, we may construct lines 
L, and L, running through p,, p and p,, p, respectively. By property v) we may 
reflect L, and L, through L to obtain L, and L,. Now the intersection of L, and 
L, is a constructible point, so there is a line, L, through this point and the given 
point, p, by properties i) and ii). Call the point where L. and L intersect p,. To 
finish the construction, use property iii) to construct a perpendicular bisector to 
P, P3, and reflect L through this bisector with property v) to obtain the desired 
line, L,. It is a straightforward exercise to show that L, has the desired properties. 


Figure 3 
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The reader may wish to try some constructions on his own. Two especially 
interesting exercises to attempt are the construction of a right triangle with given 
legs and the construction of a right triangle with a given hypotenuse and leg. More 
explicitly, given four distinct points a, B, y and 6, the reader may try to construct a 
point ¢ such that a, B,e are the vertices of a right triangle with legs aB and Be 
such that the length of Be equals the length of v6. 

Now that we have a better feel for origami constructions, we will start develop- 
ing tools to show that some figures are not constructible. The first thing we need is 
the notion of an origami number. 


Definition. F, = {a € R|J v,,v, € F such that |a| = dist(v,,v,)} is the set of 
origami numbers. 

It is easy to see that (x, y) € A, if and only if x and y are both in Fo. It is also 
easy to see that the numbers 5, 4,4,... are origami numbers. To see that + is an 
origami number consider Figure 4. In Figure 4 a line through (0, 2) and (1,0) is 
constructed, then a parallel line through (0, =) is constructed. This parallel line 
intersects the x-axis at (<,0), therefore = is an origami number. Another class of 
origami numbers can be generalized by a simple geometric construction. Starting 


with any segment, it is possible to construct a right triangle as in Figure 5. It 


follows that ¥1 + a’ is an origami number whenever a is an origami number. 


Using this construction, we see that 


v2 =Vv1+1° and 3 = 14 (v2) 


are origami numbers. In fact, the sum, difference, product, and quotient of origami 
numbers are origami numbers. 


Figure 4 


Figure 5 
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Theorem. The collection of origami numbers, F, is a field closed under the operation 
arevVista?’. 


Proof: If a, B € Fo, it follows from the definition that —a@ € F, and it is easy to 

show that a + 6B © Fy. Straightforward constructions with similar triangles are 

enough to show that a: B, a | © F,. See Figure 6. In the discussion preceding 
V1+ a’ 


this theorem we showed that is an origami number whenever a is. The 
proof is therefore complete. 


| » 


| 
77 UBl lap) 7 
oo 1 oe -Mal, 1D 
AG,lal) | Cla“y-~ | 
a“ waa 
a —— + —______—_+— 
(0, 0) (1, 0) ((B\, 0) (0,0) (1,0) (\a|, 0) 
Figure 6 


Now that we have some algebraic operations which will produce origami 
numbers, it is natural to ask if there are any more operations which will produce 
origami numbers. Once we have a list of all ways to create origami numbers and a 
method to test if a given number can be achieved, then we will know which 
geometric shapes are constructible, and which shapes are not constructible. This is 
because any figure is constructible if and only if the coordinates of all of the 
vertices are origami numbers. 


Definition. | is the smallest subfield of C closed under the operation x — 
V1+x? 


v14+x?. 


The preceding Theorem may be rephrased as F rat C Fo. It is in fact true that 
+X 
Fo =F frat” Thus, the previously listed operations which produce origami num- 
+X 


bers are the only independent operations which produce origami numbers. 


Theorem. F, = F eres 


+x? 


Proof: Since we already know that F Jima Fy, we only need to show that Fy c 


1+x 
F Jina? That is, we need to show that any origami number may be expressed using 
+X 


the usual field operations and the operation x ~ V1 +x’. It is enough to 
consider the coordinates of origami constructible points, because a number is an 
origami number if and only if it is a coordinate of a constructible point. There are 
only four distinct ways of constructing new origami points from old ones using the 
axioms for origami construction. These are illustrated in Figures 7 and 8. The only 
way a new point will be constructed is by a new crease intersecting an old one. The 
four ways of making a crease are: folding a line between two existing points as in 
the line yb in Figure 7, folding the perpendicular bisector to two points as in the 
second part of Figure 7, reflecting a line as in the third part of Figure 7, or forming 
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(a b a>) 
Figure 8 


the angle bisector as in Figure 8. We will explain the case illustrated in Figure 8 
and leave the remaining three cases to the reader. When showing that the point 
(x, y) only depends on the prescribed operations, we may assume that (a,, a,) = 
(0,0) by translation, because the point (x, y) is found by adding (a,,a,) to the 
translated point. We may further assume that (b,,b,) is on the unit circle, by 
scaling because multiplying by /b? + b5 = Ib, | 1+(b,/b,)° will reverse the 
scaling. Even further (b,,b,) may be assumed to be (1,0) because the rotation 
(x, y) > (b,x — by y, b,x + b,y) sends the point (1,0) back to (b,, b,). Let 6 = 
Z cab, with the above assumptions, cot 0 = c,/c, and 


csc 9 = Vc? + ¢3/c, = V1 + (c)/cz) . 


Now the slope of the new crease is m = tan(@/2) = csc 6 — cot 6, which only 
depends on the prescribed operations. The new point (x, y) is the intersection of 
the two lines y = mx and y = [c,/(c, — D(x — 1), so 
C> 4 mc, 
x = ———————_ an = 
Cc, — m(c, — 1) 4 Cy — mc, — 1) 


which only depends on the prescribed operations as was to be shown. 


The preceding theorem gives an algebraic description of the field of origami 
numbers, and in principle answers which shapes are constructible and which are 
not constructible using origami. In practice it is still difficult to decide whether or 


not a given number is an origami number. For example, V 4+ 2/2 is an origami 


number because V4 + 2y¥2 = V1 + (1+ 2)", but what about y¥1+ V2 ? In 


order to answer this question we need a better characterization of origami 
numbers. Before we proceed we will review some elementary facts from abstract 
algebra [AH], [L]. 
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Definition. A number, a, is an algebraic number if it is a root of a polynomial with 
rational coefficients. 

Any algebraic number, a, is a root of a unique monic irreducible polynomial in 
Q[x], denoted by p,(x). This polynomial, moreover, divides any polynomial in 
Q[ x] having a as a root. 


Definition. The conjugates of a are the roots of the polynomial p,(x). An 
algebraic number is totally real if all of its conjugates are real. We denote the set 
of totally real numbers by Fp. 


Of the numbers which we are using to motivate this section, V 44+2y2 is 
totally real, because all of its conjugates (+ V4 + 2v2. 4 + 2y2 ) are real, but y1l+ V2. is 


not totally real because two of its conjugates are imaginary (+ V1 — v2. 1—y2). 

The last topic which we review is symmetric polynomials. The symmetric group 
on n letters acts on polynomials in n variables by of(x,,x5,...,x,) = 
f(Xeay Xe +++» Xecny) Where f © R[x,,...,x,] and R is an arbitrary ring. 


Definition. The fixed points of the above action are called symmetric polynomials 
over R. 

For example, x7 + x5 is a symmetric polynomial in two variables because it 
remains unchanged when the variables are interchanged. However, x? — x3 is not 
a symmetric polynomial because it becomes x7 — x? #x? — x4 when x, and x, 
are interchanged. One important class of symmetric polynomials is the class of 
elementary symmetric polynomials. 


Definition. If []{_,(¢ + x,) is expanded, we obtain 


n n 
T](@+-x,) = Ma(x,,...,x,)t? 
k=] 1=0 
The o,(x,,...,%,) are the elementary symmetric polynomials. 


It is easily verified that 
O, =X, +X, + 1° +X 


n 
o, = the sum of all products of / distinct x,’s 


0, =X, '°X. °° X,. 
Fact. [L, page 191]. The algebra of symmetric polynomials over R is generated by 
the elementary symmetric polynomials. That is, any symmetric polynomial is a 
linear combination of products of the elementary symmetric polynomials. 

We will now begin the final characterization of the origami numbers. It happens 
that all origami numbers are totally real. To prove this, it is necessary to show that 
the sum, difference, product and quotient of totally real numbers is totally real, 
and that V1 + a? is totally real whenever a is totally real. This is proven by using 
symmetric polynomials and the following lemma. 


Lemma. 
i=1j=1 


where A and B are matrices with entries expressed in terms of the elementary 
symmetric polynomials of x; or y;, respectively. 
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This lemma is interesting because it is easier to prove a more general statement 
which implies the lemma than it is to verify the lemma. We will prove the lemma 
when the x; and y, are independent variables, a more general statement than 
when the x; and y,; represent numbers, but, nevertheless, an easier statement to 
prove. 


Proof: Let 
” / n- 
P(t)= [](t-x,) = L(-lo(xt"" 
k=] 1=0 
and 
P,(t) = I1¢ -—y)= Le (-1)’o,(y)t" 
J= J=0 
Let 
1 
X 
Xk 
xp} 
yi 
XY 
Vie , 
Xx 'y, 
yt 
1 
X,Y] 
xp hyn! 


and let A be the n X n matrix, 


0 1 0 0 0 0 
0 0 1 O O 0 
A= 0 0- 0 1 0O 0 
(-1)""'o,(%)  (-1)"o,-1(4) “+ o(X) 

Now let A be the following nm X nm matrix 

A 
A 
A= A 
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By plugging x, into P(t), we find that 
rn < +1 n- 
X= » (-1) . o(x) xp *. 
1=1 


This implies that 
AV,» =X," View 


where A is independent of k and /. In a similar way we can construct a matrix, B, 
with entries given by the elementary symmetric functions such that 


BV, = VV x1: 
Now 
ABV, = Ay Vy) 
= y, AV, 
=X, VVzy 1. 


Thus {x,y,} are nm distinct roots of det(tJ — AB) which is a monic polynomial of 
degree nm. Therefore, 


n m 

i=1j=1 
If the x,’s and y,’s were not independent variables, we would not be able to 
conclude that the elements in {x,y,} are distinct. 


With this lemma, we are ready to prove that the set of totally real numbers form 
a field under the operation x —~ V1 +x?. 


Theorem. [F Jia S Fre: 


Proof: If a, B € Frp, we must show that —a,a~',V1+a’*,a + B,a: B E Fre. 
Let {a}, be the conjugates of a and {8}, be the conjugates of B. We will 
prove the theorem by considering the following five polynomials. 


q_At) = [h(t + a@;), 


da+p(t) = I] I] (t —a@;- B;), 


dag(t) = TIT (t - 8,). 


The proofs of the first three cases are similar, and the proofs of the last two cases 

are similar, so we will only prove, in detail, the third case and the fifth case. If we 

expand q reall?) it is clear that the coefficients of t* will be symmetric polynomi- 
+a 


als in the a;. They may, therefore, be expressed as rational polynomials in the 
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elementary symmetric polynomials of the a;. Since (—1)/o,(a) are the coefficients 
of the minimal polynomial for a we may conclude that g Jrza?) E Qt]. It is 


clear that V1+ a’ is a root of q rea) thus the minimal polynomial of 
9) . 6 e ° e 
Vl+a‘,p real!) divides q Jina? The fact that a is totally real implies that 


2 
i 


all of the conjugates, a,, are real. Thus, 1 + a; are all real and positive, so 


+ / 1 + a? are all real. We now conclude that all of the roots of q Jira?) are 


i 


real, and therefore V1 + a” is totally real. 

For the fifth case, we use the previous lemma to conclude that q,,(t) € Olt]. 
Clearly, aB is a root of Dapht) and all of the roots of q,,(t) are real because a and 
B are totally real. In the other three cases, it is necessary to show that each of the 
q’s are polynomials with rational coefficients and only real roots. The first two 
cases may be tackled with the fact that the elementary symmetric polynomials 
generate the algebra of all symmetric polynomials. The fourth case may be verified 
with a lemma analogous to the previous lemma stating that 


Il H: —x;,—y,;) = det(t] — A — B). 


i=1j=1 


This theorem gives us a practical way to decide that certain shapes may not be 
constructed using origami. For example, it is not possible using origami, to 
construct two cubes such that the volume of the second cube is twice that of the 


3 
first cube. If this construction were possible, V2 would be an origami number and 


3 
would therefore be totally real. One, however, finds that the conjugates of v2 are 


3 v3 3 3 
v2(- 5 + Zh) and y2, but the first two are not real, so y2 is not an origami 


number. 
As we have seen before, V2 = V1 + 1? and V4 + 2y¥2 = V1 + (1+ v2) are 
origami numbers, so y 2 + v2 = v2 “V4 + 2/2 isan origami number. From this 


we see the following corollary. 


Corollary. It is not possible to construct a right triangle with arbitrarily given 
hypotenuse and leg using origami. 


Proof: If this were possible, it would be possible to construct a right triangle with 
hypotenuse y2 + v2 and leg 1, since these are origami numbers. Any such 


2 
triangle would have a leg of length y1 + v2 = \ (v2 + f2 — 1’, but this is 


impossible because y 1 + v2. is not totally real. 


The following corollary is a consequence of the standard algebraic description 
of compass and straight edge constructions and the two previous theorems [AH]. 


Corollary. Every thing which ts constructible with origami is constructible with a 
compass and straight edge, but the converse is not true. 


We want to expand on the relationship between compass and straight edge 
constructions and origami constructions. To review, compass and straight edge 
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constructions, let F ; be the smallest subfield of C closed under the operation 


x > yx, then F yg MR is the collection of numbers which are constructible with a 
compass and straight edge. From our work thus far, it is evident that the origami 
numbers, Fy, are contained in F ; NF rp. It is in fact the case that Fy = Fg OF pp. 
This characterization of the origami numbers is related to David Hilbert’s 17th 
problem. At the International Congress of Mathematics at Paris in 1900, Hilbert 
gave a list of 23 problems [B]. His 17th problem was to show that any rational 
function which is non-negative when evaluated at any rational number is a sum of 
squares of rational functions. In 1926, Artin solved Hilbert’s 17th problem [Ar]. 
The key idea which Artin used was the notion of totally positive. An element of a 
field is defined to be totally positive if it is positive in every order on the field. 
Artin proved that an element is totally positive if and only if it is a sum of squares. 
This is the idea which we use to prove the final characterization of the origami 
numbers. 


Fact. [L, page 457]. If K is a finite real algebraic extension of Q, then an element 
of K is a sum of squares in K if and only if all of its real conjugates are positive. 


Theorem. F, = a F ge OE pp. 


5 and that Fy CF; OF pp, so we 


Proof: We have already shown that Fy = F eed 


1+x 
sequence of totally real numbers, {6,}"., and a sequence of totally real fields 


{K,}"=y' such that Ky = Q, K; = K;_,(B,), a = B,, and each B, has degree 2 over 
K,_,. Since B,; has degree 2 over K;_,, B; is a root of a polynomial of the form 


need to show that Fe NFrp CF rat If a€&F g Nr, then there exists a 


x? +e,x+d,, 


where c,,d, € K;_,. Therefore, (8, + c,;/2)* = c?/4 —d;. By the proof of the 
previous theorem, we know that every conjugate of (B; + c;/ 2) is the square of 
some conjugate of B, + c,/2. Hence, each of the conjugates of (8, + c,/2)” are 
positive and (8, + c,;/2)” is a sum of squares of elements in K,_,. Say that 


(B; + ¢,/2)° =r?, +r7, + i +r? 


then, 


and we are done. This shows that any totally real number in F iE is an origami 
number. 


Legend has it that the ancient Athenians were faced with a plague. In order to 
remedy the situation, they sent a delegation to the oracle of Apollo at Delos. This 
delegation was told to double the volume of the cubical altar to Apollo. However, 
the Athenians doubled the length of each side of the altar, thereby creating an 
altar with eight times the volume rather than twice the volume of the original altar. 
Needless to say, the plague only got worse. For years, people have tried to double 
the size of a cube with compass and straight edge, and the gods have not smiled 
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upon them. We now can see that the gods will not be satisfied with our elementary 
origami either. 
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An Abstract Algebra Story 


Uri Leron and Ed Dubinsky 


Statement: The teaching of abstract algebra is a disaster, and this remains true 
almost independently of the quality of the lectures. 


We agree. 

And we think there’s a fairly wide consensus on this among experienced 
abstract algebra instructors, and an even wider one among experienced 
students. 


Statement: Jhere’s little the conscientious math professor can do about it. The stuff 
is simply too hard for most students. Students are not well-prepared and they are 
unwilling to make the effort to learn this very difficult material. 


We disagree. 

But we suspect that many experienced abstract algebra instructors hold such 
beliefs. This is especially true for some excellent instructors: Their lectures 
are truly masterpieces, surely you can’t improve much on that; so if the 
students still fail, that’s too bad, but it can’t really be helped. 


We claim that, far from being an immutable fact of nature resulting from 
inadequacies of the student, this failure is, at least in part, an artifact of a too 
narrowly conceived view of instruction. In fact, replacing the lecture method with 
constructive, interactive methods involving computer activities and cooperative learn- 
ing, can change radically the amount of meaningful learning achieved by average 
students. 

In this paper we would like to paint a picture of such an alternative approach, 
which we and others have been developing and using in our classes over the last 
several years. We are painfully aware of the limitations inherent in any attempt to 
give such a description by means of the written text only. It would have been much 
better if you could actually visit our classes and observe the dynamics of the 
students’ interactions with both the computer and their peers. By way of compro- 
mise, we will try to simulate such a visit by organizing our paper around several 
classroom “scenarios” and some commentary on the events depicted in each 
scenario. As a matter of principle, we have tried to make the scenarios as realistic 
as space limitation permits. 


FIRST SCENARIO: WHAT IS IT ALL ABOUT? 


Background. Our approach involves students working on computers in laboratory 
sessions and on their own. It also involves classroom discussions and assigned 
exercises, mainly with paper and pencil. The students work in teams which for the 
most part remain fixed for the entire semester. 
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This scenario takes place very early in the semester, after the students have 
done some computer work. In the exchange we are about to describe, the students 
are using a number of programs they have written in the programming language 
ISETL to implement the group axioms. The students wrote these programs after 
very brief and intentionally vague discussions (in the text and in class) of binary 
operations. The programs are quite simple and close to mathematics. For example, 
here is a program and one of its subprograms, expressing the definition of a group 
and existence of left-identity, respectively. The program is_group accepts as input 
a set G and a binary operation o and returns “true” or “false” according to 
whether the set with the binary operation is or is not a group. Similarly for 
has_ identity. 


is_group:= func(G,o); 
return is_closed(G,o) and is_associative(G,o) and 
has _identity(G,0) and has_inverses(G,o); 
end; 
has_identity:= func(G,o); 
return exists e in Gl(for all a in Gle.o a=a); 
end; 


Reality. You enter an abstract algebra class. The class is taking place in a 
microcomputer lab. The students are working on computer activities in teams of 2 
to 4. The expressions they type at the keyboard, except for some minor details, 
look pretty much like standard mathematics; as a mathematician you have no 
trouble understanding the meaning of what they type, though you may be totally 
unfamiliar with computers. Before hitting the Return key, which would instruct the 
computer to evaluate their expression and exhibit the result, the students engage 
in a lively discussion trying to predict what this result is going to be. For example, 
you might observe something like the exchange in Figure 1!. 


Reflections. At this point you, the reader, have surely formed some hypotheses 
regarding our class. Let us now try to attend to some possible interpretations (and 
mis-interpretations) of what is really going on here. We present some likely 
hypotheses and questions posed by an idealized reader (IR), each followed by our 
reaction. In fact, these include some of the questions we have most often been 
asked when discussing our method with colleagues. 


Idealized Reader (conjecturing): The students are involved in a mathematical 
investigation concerning group-theoretic properties of modular arithmetic systems, 
also practicing at the same time their knowledge of the group axioms. 


True. 


IR: The students are learning by the “discovery method’’. 


This is only partially true. In the first place, the students are indeed involved in 
posing questions and in trying to discover answers, but far from being left on their 
own to do that, they are guided by a worksheet provided by the instructor. For 
example, the activity in the above exchange might evolve in response to the 


‘Real exchanges are characterized by a lengthy and messy zigzag path, and their transcription would 
take up more space than we can afford in this article. Therefore we are limited to bringing here only a 
stratified and compressed version of a real exchange. In fact we can include only brief, isolated 
fragments. 
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Figure 1. Student interactions within a mathematical computerized environment. 


Note: the students’ input is preceded by the ‘>’ prompt; the computer’s output is not. 


false; 
> 
false; 


false; 
> 
true; 
> 
true; 
> 
false; 


> 
false; 
> 
false; 
> 
false; 


Interaction with the computer 


Z12 = {0..11}; 


al2 = k,y — >|(x + y) mod 12|; 


is_ group(Z12, al2); 


identity(Z12, a12); 


is_ group(Z12-{0}, a12); 


is_closed(Z12-{0}, a12); 


7 .al2 8 in Z12 — {0}; 
7 .al2 5 in Z12 — {0}; 
a: Ixy—->xtyl; 
is_ group(Z12, a); 


7 .a8 in Z12; 


m12 :=|x,y — > (x*y) mod 12]; 
is_ group(Z12, m12); 


is_closed(Z12, m12); 
has_identity(Z12, m12); 


has_inverses(Z12, m12); 


is_invertible(Z12, m12, 0); 
is_group(Z12 — {0}, m12); 


is_closed(Z12 — {0}, m12); 


2 .m12 6 = 0; 
2 .m12 6 in Z12; 


Z11 := {0..10}; 
m11 :=|x,y = (x*y) mod 11|; 


is_ group(Z11, m11); 


is_group(Z11 — {0}, m11); 
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Interaction within the team 


Why doesn’t 7.12 have 12 in it? 
What’s the difference between 212 
and al2? 

What are the inputs to is_ group? 


What’s happening here? I thought it 
would be true or false? What is 
identity supposed to return? 

What’s wrong? Let’s check the group 
properties. 


Aha! So it’s not closed, but I can’t 
see why. 
Let’s try some numbers. 


I know! 
Let’s try another operation. 


It’s because 7 + 8 = 15, not 3. 


What do you think about times? 
I think it will also be a group. 
Oops! 


So it is closed... 
... and it does have an identity... 


... but no inverses! What’s an 
inverse, anyway? 


Oh! It must be zero. Let’s take it out. 
Oops! I don’t understand it. 


Not closed? But is was closed 
before... Must be the zero... 
What about this... 


that’s it. Let’s try another mod? 11? 
Okay, this is the set... 

. and the operation 
now let’s try it. I think it will work. 


Oops! Oh, yes, the zero again. 


Now try it. 
Yeah!!! 
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following task: 


Explore the modular systems Z,, (with or without 0) relative to addition and 
multiplication mod n. Formulate some conjectures, test them and try to give 
some explanations. 


In the second place... 


IR: ... But how can you expect students to discover in a few hours what took the 
best mathematical minds centuries? 


I was just getting there... In the second place, students are not expected to 
actually obtain complete, “correct”? answers. The main purpose of the activities is 
to give them an experiential basis to which they can later relate the more abstract 
and formal treatments. Thus, subsequent discussions of a concept are more 
meaningful for students who have made a non-trivial effort trying to figure it out 
on their own, whether they have actually discovered it or not. Rather than 
“discovering”? mathematical concepts, we think of our students as constructing 
them (in their mind). Computer experiences and classroom discussions are meant 
to help them make these mental constructions. 


IR: Doesn’t learning how to program take a lot of time away from learning the 
mathematics? 


Because the syntax and basic constructs of ISETL are so close to standard 
mathematics, learning the language is inseparable from learning the mathematics-- 
the programming “‘overhead” is minimal. You can check this for yourself by seeing 
if you have any trouble understanding the ISETL expressions in Figure 1. With our 
students, we spend only the first few sessions actually dealing with the language, 
and even then we do quite a bit of relevant math (like properties of the modular 
operations and of permutations). In writing their own code, and in using it to 
explore particular groups, students gain an understanding of the group concept of 
a quite different sort from that gained by listening to lectures and doing paper and 
pencil exercises (or by using the computer for drill and practice). 


IR: Still, wouldn’t it be better if you gave them a software package in which functions 
like is_group, has _identity, has _inverses are pre-programmed? This way you’d save 
them the time of learning to program and the time of programming all these functions 
on their own, and still they would be able to “interact” with the computer. 


Programming is_group and the other functions is where the most important 
learning occurs. It is the goal, not just a tool. Experience, as well as modern 
learning theory (see, for example, Davis, Maher and Noddings, 1990 or Selden and 
Selden, 1990) tells us that one doesn’t learn the group concept by memorizing the 
definition. In order to acquire meaning, the group concept has to be constructed in 
the learner’s mind. Our method is based on the premise that if the students are 
asked to construct the group concept on the computer (by programming it), there 
is a good chance that a parallel construction will occur in their mind. 


IR: Are the computer activities meant to replace the traditional lecture? 


Yes. We think that the lecture method is, for most students, quite ineffective. 
Worse still, it makes them feel stupid, alienated. Most students tell us that they 


230 AN ABSTRACT ALGEBRA STORY [March 


have very little idea of what the lecture is about, even when it is delivered by a 
master lecturer. And if you try, say a year later, to see how much was retained 
from the course, you’ll discover that it is close to nothing. (See, for example, 
Vinner, 1992.) 


IR: Are you assuming, then, that through the activities they get all the instruction 
they need? 


Not at all. The activities are followed by team work on assignments, team 
discussions, class discussions of subtle points, summary handouts (or assigned 
reading) of definitions and proofs, exercises etc. The role of the computer activities 
is, as we have said before, to provide an experiential basis for all the other learning 
modes. An important key to making this work is that students need to reflect on 
the computer (or paper and pencil) constructions that they make. A powerful 
stimulus for reflection is our insistence that they do their work in teams. Dis- 
cussing and explaining what they are doing can lead students to make mental 
constructions that are parallel to the ones they are making on the computer. 

When the students eventually encounter the “official”, general, abstract, formal 
version, this is perceived by them not as totally strange and prohibitive (as we 
believe is the case in standard lectures, where such abstractions are presented 
without any experiential basis), but as an elaboration of their previous experience. 
In popular terms we may say that the activities provide an initial intuitive 
familiarity with the topic to be learned. In more psychological terms (supported by 
an elaborate theoretical framework and research), we may say that the activities 
help the students to “construct” the mental processes, objects and relations 
necessary for a meaningful understanding of the topic. 


IR: I, too, believe in giving students intuitive explanations (such as the intuitive idea 
behind a complicated proof ), which I always add to the formal part of my lectures. 
So why all the fuss? 


Experience, theory and research all point to the fact that verbal explanations that 
do not relate to the student’s prior experience are quite ineffective (except for a 
few individuals with special talent in mathematics.) Intuition is the result of 
personal experience based on activity and interaction. A verbal representation of 
your intuitions (based on your past experience, activity and interaction) usually 
fails to re-create the same intuitions in the student’s mind. Students need to 
construct the experience for themselves. Verbal explanations can help if they come 
after the student has formed an experiential basis. For then they serve to elaborate 
and conceptualize something the student has already vaguely known through the 
experience. We may say that verbal explanations can elaborate and explicate 
existing intuitions, but cannot create new ones. 


IR: Is your method so perfect? Surely there must be some difficulties, some unsolved 
problems, some things you are not so sure about... 


You must be kidding! The traditional method of teaching by lectures and exercises 
has developed over several hundred years. We feel that we have an approach 
which represents a significant improvement, but we are just beginning. Although 
we have been able to implement our method to obtain promising results in our 
own classes, and we are even beginning to learn how to disseminate this approach 
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to others, a multitude of problems remain and it will be a very long time before 
they are all solved. 

There are difficulties inherent in the use of computers and with cooperative 
learning. It can be slow; it is not easy for a computer to deal with infinite objects; 
and the computer can only help indirectly with making proofs. Moreover, we 
cannot overemphasize the profound change in the teacher’s attitude and the need 
to develop new skills that are part of adopting this method. Nevertheless, our 
theoretical perspective, experiences and research, and those of others, as well as 
reports from our students, all convince us that the improvement —the revolution! 
— in student learning justifies the effort. 


SECOND SCENARIO: CONSTRUCTING LAGRANGE’S THEOREM 
AND ITS PROOF 


Background. This topic is treated about one-third of the way into the semester. At 
that time the students, as well as the software environment, have grown consider- 
ably smarter. In particular, we describe three points which are necessary for 
understanding the next scenario. 

1. At this point, the students are well-acquainted with groups, subgroups, and 
cosets, as well as with various examples, notably the modular groups Z, (with 
addition mod n) and the symmetric groups S,. All this knowledge is represented in 
ISETL by various functions and other mathematical objects which the students 
have constructed in previous activities. These are collected in a special initializa- 
tion file (called isetl.ini) on the computer’s disk ((SETL’s long-term memory), and 
are automatically loaded each time ISETL is loaded. Thus isetl.ini can be consid- 
ered to be a (dynamically changing) extension of the language, representing the 
collective, accumulative, official wisdom of the class. 

2. The file isetl.ini at this point contains all the group and subgroup functions 
(is_closed, has_ identity, is_ group, is_subgroup, etc.) as well as some standard 
sets and operations such as Z,,, a,, (addition mod12), $,(permutations of 
{1, 2,3, 4}), os (permutation product), etc.It also contains the func PR, written 
previously by the students, which implements the definition of the “extended 
product” in a group. This program constructs and returns a func which accepts two 
inputs, decides if they are elements or subsets of the group and performs the 
appropriate operation between two elements, an element and a subset, or two 
subsets. Here is one version of it. 


PR:= func(G,o); 
return func(x,y); 

if x in G and y in G then return x .o y; 

elseif x in G and y subset G then return € x.0 b: b in y}; 
elseif x subset G and y in G then return € a.o y: a in x}; 
elseif x subset G and y subset G then return 

{ a.o b: a in x,b in y}; 
end; end; end; 


Executing a statement such as oo := PR(Z6, a6) will then make oo the name of 
the generalized product in this group. 

3. Finally, the file isetl.ini contains an additional program called name _group, 
which assigns to a given group (and optionally a subgroup) all the standard 
notations. For example, assume that Z12 has been assigned in ISETL to denote 
the set of integers mod 12, al2 the operation of addition mod 12, and H3 the 
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subgroup {0, 3, 6,9}. If we now execute name_group (Z12,a12, H3), then the 
names G,o,e,1,00, H, K and GmodH are automatically assigned the values, 
Z12, a12, 0, the inverse function, the extended product, H3, the coset function, and 
the set of all right cosets of H3 in G, respectively. 

Students are encouraged to always use name _group in their computer investiga- 
tions. The purpose is two-fold: First, using standard, short names facilitates the 
computer work. Secondly (and more profoundly), we believe that using “generic” 
names, and retaining the same names for different examples, helps students see 
the example under investigation as being a “generic” example; that is, it helps 
them in ‘“‘seeing the general in the particular” (Mason & Pimm, 1984). This we 
believe, is a crucial step in the difficult and all-important processes of generaliza- 
tion and abstraction. 


REALITY. The actual worksheet which the students receive appears in the left 
column of Figure 2. All of the mathematical expressions that are listed can be 
written mutatis mutandi in ISETL and evaluated on the computer. In the right 
column are a few remarks about what the students are expected to do and what we 
expect to be happening in their minds. 

The entire activity represents about 60 minutes of cooperative lab work. 

In Figure 2 we see students engaged in computer activities, prompted by a 
worksheet. Based on the experience gained in these activities, students are now 
further engaged in doing mathematical tasks and calculations in the classroom. 
Each task is alloted a certain amount of time and the students work on them 
cooperatively in their teams. While the students are working, the instructor has 
ample time to move around in the classroom, look at what the students are doing, 
answer questions and occasionally engage in a dialogue. The activity is followed by 
a class discussion. This may be the time for ‘‘official” summaries by the instructor. 
Some of the outcomes of the class activity are further pursued in homework 
assignments. 

Figure 3 illustrates a typical classroom activity which follows the computer 
activities in the Lagrange’s theorem worksheet. 

At this point, the instructor may decide that all of the ingredients of the proof 
of Lagrange’s theorem are present in the classroom and, more importantly, have 
been sufficiently constructed in the minds of most of the students. The complete 
formal proof may now be assigned as homework. Alternatively, or additionally, the 
instructor may tie everything together by presenting the formal statement of 
Lagrange’s theorem and its proof. There’s also a third alternative: Following the 
homework assignment, the instructor discusses with the students their work on 
the theorem and its proof, and then hands out for them a written page with the 
complete, polished proof on it (or refers them to the appropriate page in the 
course text). 


Reflections. As in the first scenario, we now attend to some questions, observa- 
tions and conjectures as might be posed by our “‘Idealized Reader’. 


IR: The func for PR looks to me like a hard program to write. Doesn't it give 
students a lot of trouble, and doesn’t it involve too much programming effort? 


No on both counts. The main programming ingredient is the “if statement” with 
all the “‘elseif’? clauses and by the time we get to this activity, the students are 
quite comfortable with such constructs. Other than this, the program is almost 
identical with the mathematical definition of this operation. 
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Figure 2. Tasks on cosets 


Worksheet 


H3 = {0, 3, 6, 9}, 
name __ group(Z12, a12, H3); 

G; #G; 8.09; is_group(G, 0); 
e; i(7); 

H; is_ subgroup(G, o, H); 


H .oo0 0; H.001; H.00 2; 
H .o0 3; H .00 45... 
K(0); K(1); K(2); K(3); K(4);... 


GmodH; #(GmodH); 

H subset G; H subset GmodH; 
H in GmodH; _G in GmodH; 
K(1) in GmodH; K(2) in GmodH; 
{1, 4, 7, 10} in GmodH; 

{1, 4, 7, 10} subset GmodH; 

% union (GmodH) = G; 

forall a in G:a in K(a); 


Find as many equalities as you can 
among the sets H, K(0),..., K(4), and 
among their cardinalities. 

Verify that your equalities return 
“true’’. 


When is it the case that the relation 
H .oo a = H..oo b is true? 


Find examples of i/= j in G such that 
(one condition at a time): 

(a) K(i) = KQ), 

(b) KG) inter KG) = { }, 

(c) neither (a) nor (b) holds. 


What is the relation between the 
numbers 

#G, #H and #GmodH? 

Can you see an explanation? 


Do name_ group(Z12, a12, H4), where 
H4 = {0, 4, 8}, and predict the 
answers to all the previous activities. 
Check your predictions on the 
computer. 

Does the relation you found between 
#G, #H and #GmodH still hold? 


Repeat the same activity with 
name _ group(S3, os, A3). 
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Remarks 


Students are asked to predict the result of 
each expression, then enter it and resolve 
any conflicts between their predictions and 
the actual result. The goal for this first 
group of expressions is for students to 
refresh their memory about the meaning 
and interconnections of the many symbols 
involved ‘in this activity. 


Here students are moving on to familiarize 
themselves with cosets and their various 
notations. The operation K maps an ele- 
ment of G to its right H-coset. The “three 
dots” prompt the students to look for a 
pattern. 


These are further prompts for exploring the 
set GmodH of all the right cosets of H in 
G, with a view to generality (again, using the 
predict-enter-resolve cycle). The operator 
“% union” is the extension of the binary 
operator “union” to operate on any (finite) 
set of sets. 


Computer exploration that could lead to one 
of the key steps in the proof of Lagrange’s 
theorem. 


Students are confronted with another issue in 
the proof. 


The last major ingredient of the proof. (The 
symbol “/=” is ISETL’s way of approxi- 
mating the mathematical “+ ” on the stan- 
dard keyboard.) 


Based on their previous activities and explo- 
rations, the students are now ready to try to 
discover Lagrange’s theorem. Furthermore, 
since they have also discovered the proper- 
ties of cosets which are responsible for the 
truth of the theorem, they now have a pretty 
good feeling for why the theorem is true. 


If a student has begun to construct the idea of 
Lagrange’s theorem and its proof, then 
looking at a second example may help bring 
it out. 


More of the same with a non-commutative 
example. 


[March 


Figure 3. Worksheet on Lagrange’s Theorem 


Tasks Outcomes 
Formulate a conjecture regarding the Some teams come up with the conjecture that 
orders of a finite group G, a the number of cosets times the order of the 
subgroup H, and the set of cosets subgroup is equal to the order of the group. 
GmodH. Fewer state that the order of the subgroup 


divides the order of the group. Some stu- 
dents do not see either of these explicitly, 
but have it on the tip of their tongues, so to 
speak: they recognize it readily upon hear- 
ing it from their peers. 


Formulate these points in symbols. Formulas such as 
#G = #H*#GmodH 
o(G) = o(H)* #GmodH 
#H|#G 
usually arise. 


What are some properties of cosets The following are easy to observe from the 
which seem to “cause” these previous activities, and some of them are 
relations? Can you show how your conjecture commonly picked up by the students: 
follows from the properties you have listed? —H is one of the cosets. 


—Every element belongs to some coset. 

—The union of all the cosets is the whole 
group. 

—Different cosets are disjoint. 

—If a and b belong to the same coset then 
Ha = Hb. 

—All the cosets have the same number of 
elements. 


IR: I notice that this program is supposed to create and return a new operation. I 
find that pretty confusing. Doesn’t it give the students trouble? 


Yes, this is really difficult. This particular func takes as input a set and an 
operation (which is a function) and returns the generalized operation (which is also 
a function). One of the hardest things for students to do is to treat functions as 
total entities, or objects, and perform operations on them (Sfard, 1992). We see 
this in many situations, not only in algebra, but also in calculus and other subjects. 
Students expect a func to return a number or even a set of numbers and major 
mental adjustments are required before the student can understand that the func 
is to return a “whole function”. 

But this difficulty is with the mathematics, not the programming. We have done 
studies which suggest that confronting this kind of mathematical difficulty in a 
computer context can help students develop the ability to work with functions as 
objects (Ayres, Davis, Dubinsky, and Lewin, 1986, Breidenbach, Dubinsky, Hawks, 
and Nichols, 1991). 


IR: Some of the expressions your students write are more formal than in the usual 
abstract algebra classroom. In my experience, students find such formalities quite 
hard and bizarre. 


This had been our experience too—as long as we were trying to present such 
formalities in a lecture. We were surprised and delighted to see students arriving 
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at such subtleties on their own, aided by feedback from the computer and the 
on-going discussions. The ISETL medium makes it both necessary and possible for 
students to deal with such subtleties. The need to “explain” a mathematical 
concept to a dumb computer is a wonderful motivation for using formal language; 
watching how the beast “understands” (or mis-understands) your explanations, is a 
great way to climb to the next step in the ladder of successive refinements. 

We would like to suggest that there is a distinction between formalism and 
empty formalism. When students are asked to deal with a formal statement before 
they have an opportunity to construct the processes and objects that the formality 
describes, then it is empty formalism. When the formal statement is a description 
of ideas which already exist in the student’s mind, then the symbolism becomes a 
convenient way of communicating these ideas and can even be a powerful tool for 
further mathematical growth. 


IR: Asking the students (in your worksheet) to characterize the relation “Ha = Hb” 
seems to me like a regular pencil-and-paper exercise. 


This is true, and if a student prefers to do it this way, that’s fine with us. However, 
many students merely get stuck when facing such a question with only pencil-and- 
paper as working tools. In the ISETL lab, students still make errors but they rarely 
get stuck, since they can utilize the computer to analyze examples and conduct 
investigations. In our experience, most students prefer an environment in which 
they can utilize such “experimental scaffolding” to aid the theoretical reasoning. 


IR: Do the students really discover all of those coset properties on their own? 


Some do and some don’t. It is wonderful for those who do, but again, the point is 
that this is not necessary. The effort that students put into trying to discover these 
facts is sufficient to change the way in which they respond internally when they 
hear it from their classmates, or from the instructor. 


IR: I[sn’t it frustrating for the students who fail to accomplish the given tasks? 


It is very important for the instructor to make sure students don’t conceive of this 
as failure. In our classes, we explicitly discuss this issue with the students, and 
emphasize many times that the main point is spending the time and effort on the 
problem, not solving it. 


IR: Well, you have said a lot about alternatives to lecturing, but here, in the end, 
what you are doing with Lagrange’s theorem is to give a lecture about the theorem 
and its proof. 


We have no objection to lecturing as such. Our main claim is that introducing new 
material via a lecture may be a very effective teaching method, but it is mostly a 
very ineffective learning method. We use (short!) lectures to summarize and 
elaborate something the students have previously spent considerable time and 
effort working on. Ideally, the lecture is on something the students are already 
familiar with; it only serves to present it in a more explicit, general, precise, formal 
way. 


IR: The activities have directed the students to some ingredients of the proof, but 
many remain untouched. For example, they may have discovered that different cosets 
are disjoint, but there’s nothing in the activities to help them actually prove this. 
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True. In a way we may say that the students now see the proof as being based on a 
few “‘main ideas” of the sort you have mentioned, but they haven’t yet examined 
the proof of these main ideas themselves. This ‘“‘layered”’ view of the proof is called 
a structural proof and has actually some advantages (Leron, 1983, 1985). In this 
sense we might say that the students are now familiar with the “top level” of the 
proof, but they still need to supply the details of the lower levels. In practice, these 
missing details, are usually assigned as homework. 


THIRD SCENARIO: A GENTLE INTRODUCTION TO NORMALITY 
AND QUOTIENT GROUPS 


Background. The situation at this point in the course is similar to that described at 
the beginning of the second scenario, and we are making similar use of the file 
isetl.ini and the procedure name _group discussed there. At this point the students 
have at their disposal, and are pretty comfortable with, the set Gmod# of all right 
cosets of a subgroup #7 in G, and the “extended product” operation oo defined on 
it. This operation had already been extensively used to construct and investigate 
cosets, but at this point, they have hardly used the fact that the same operation can 
also be used to multiply two cosets. 


Note: We have chosen here a somewhat non-standard route to approaching the 
notions of normality and the quotient group. In particular, the set GmodH used in 
the ISETL activities is defined whether H is normal or not. Also, we are initially 
making an unusual choice for our definition of coset product. Eventually, for a 
normal subgroup H in G, GmodH and this product coincide with the usual G/H 
and the usual definition of product “by representatives’. But with our approach 
they can be used to great advantage before normality has even been introduced. 

The activity we are about to discuss takes place a little before we are half-way 
through the course. It typically starts with a short (10-15 minutes) class discussion, 
in which the previous activities and results concerning subgroups and their costs 
are recalled. 

It is then noted that, since oo enables us to multiply any two cosets, the 
question naturally arises as to whether (or, better yet, when) GmodH is a group 
under this operation. Next, the students do some “‘warm-up” activities on the 
computer to remind them of various definitions and concepts. The students then 
proceed to do a hands-on investigation, guided by worksheets as shown in 
Figures 4, 5. 


Reality—Normality and Quotient group worksheet. 


Note: As a space-saving device, we are writing several expressions to a line; in 
reality the students enter them one by one. 

The computer activities shown in Figures 4 and 5 last for about 50 minutes. The 
students are given instructions which prompt them to focus their attention on the 
relevant aspects of a complex situation. One of the pleasant features of this kind of 
non-prescriptive learning environment is that those who can make sense of such 
advice will, while those who can’t, will just ignore it (at least for the moment), with 
no harmful side-effects. 

Here are the instructions given to the students. 
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Figure 4. Products of cosets 


Following the warm-up activities the students start to look into the product of cosets and its 
properties. Some of these expressions are saying the same thing, but with different notation and on 
different levels. It is important (and non-trivial) for students to be able to experience these different 
levels of expression as being “the same”. As for the main question, at this point many tend to 
over-generalize, believing that GmodH is always a group. 


> name _ group(Z12, a12, {0, 3, 6, 9}); 
Group objects defined: G, 0, 00, e, i. 
Subgroup objects defined: | H, GmodH, K. 
> K(1) .oo K(2); 


{0, 3, 6, 9}; 
> K(1) .oo K(2) in GmodH; 
true; 


> K(1) .oo K(2) = K(0); K(0) .oo K(1) = K(1); K(2) .oo K(4)/= K(9); 
true; true; false; 
> for all a,b in G[K(a) .oo K(b) = K(a + b); 


false; 

> for all a,b in G|K(a) .oo K(b) = K(a.o b); 
true 

> for all ain GIH .oo a = a .oo H; 

true; 

> exists x, y in GmodH]|x .oo y notin GmodH; 
false; 

> is_closed(GmodH, oo); 

true; 

> is_ group(GmodH, oo); 

true; 

> identity(GmodH, oo); 

{0, 3, 6, 9}; 


In the following activities, predict the answer and then check on the computer. 
Try to settle any discrepancies that arise. Please pay special attention to the 
following points and write down any observations you may come up with. 

e When is the product of two cosets again a coset? 

¢ When is (Ha).Hb) = H(ab)? 

e When is GmodH closed under the operation oo? 

e When is GmodH a group? 


The last part of the classroom session lasts about 20 minutes, and will not be 
described here in detail. The instructor leads an interactive discussion in which the 
different teams share their discoveries, conjectures and questions. By building on 
the material that the students bring up, the instructor is able to state most 
naturally and smoothly the definition of a normal subgroup, the theorem that when 
H is normal then GmodH forms a group, and the (now very easy) proof of this 
theorem. Normality is naturally introduced here as the condition which insures 
that Gmod#H be a group, and the definition most often discovered by the students 
is aH = Ha for all a € G. Except for the new name, the students can really feel 
that the instructor merely summarizes what they have found in their investigations. 
In the session that follows, the instructor makes the final ties with the “standard”’ 
approach by explaining that when H is normal, GmodH is commonly denoted 
G/H, and is called the quotient group of G modulo H and coset product is 
commonly defined by the formula (Ha)(Hb) = H(ab). In our approach, this 
formula receives the status of a theorem that comes up in the activities. 
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Figure 5. A more complicated case 


The second round through the activities, which we omit, repeats the experience of Figure 4 with a 
non-normal example. At this stage, some students over-react and decide that GmodH is a group if 
and only if G is commutative. Coming to the most “mature” example so far (a normal subgroup in 
a non-commutative group), students start to realize that they need to look deeper into the properties 
of H that make GmodH a group. They also come to appreciate that the main issue here is closure, 
namely, when is the product of two cosets again a coset. More specifically, when is it the case that 
HaHb = Hab? This leads rather smoothly and naturally to the desired property Ha = aH, which is 
one way of defining normality. It is very important to note that the way they have “constructed” it 
for themselves, normality is immediately perceived as a way to ensure that GmodH is a group. 


> A3 = {[1, 2, 3], [2, 3, 1], [3, 1, 2]}; 

> name _ group(S3, os, A3); 

Group objects defined: G, 0, 00, e, i. 

Subgroup objects defined: H, GmodH, K. 

> G; #G; 

{[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]}; 6; 
> H; #H; 

{[1, 2, 3], [2, 3, 1], [3, 1, 2]}; 3; 

> GmodH; #GmodH; 

{{[1, 2, 3], [2, 3, 1], [3, 1, 2}}, {[1, 3, 2], [2, 1, 3], (3, 2, 1)}}; 
> forall p,q in GIK(p) .oo K(q) = K(p.o q); 
true; 

> forall p in GH. 00 p = p..oo H; 

true; 

> is_ group(GmodH, oo); 

true; 

> identity(GmodH, oo); 

{{1, 2, 3], [2, 3, 1], [3, 1, 2]}; 


Reflections. 


Idealized Reader: In my experience, the quotient group is one place in the course 
where most students experience a real crisis of meaning. They just stare at the 
symbols, or hear the instructor’s words, and all they see (or hear) is so many ink 
stains (or so many words). Is there a difference in your class? 


There’s no question that this is a difficult concept, and in our class too students 
have to struggle hard. However, because of the computer activities and the 
accompanying discussions in the teams, they now have a way of constructing 
meaning by “successive refinement”, as they keep updating and refining their 
understanding by watching and trying to understand the computer’s response to 
the mathematical expressions they enter. 


IR: What do you mean by “constructing meaning’’? 


We believe that this issue is at the heart of students’ difficulties in abstract algebra 
and mathematics in general. Students often have great difficulty with a theorem 
and its proof, such as the Homomorphism theorem or even Lagrange’s theorem. 
We tend to explain this difficulty by saying “the theorem is complicated”. But is it 
really the theorem that is so complicated? The theorem and its proof are, 
essentially, about certain relationships which hold between certain mathematical 
objects. These relationships are fairly simple and we believe that the difficulty 


1995] AN ABSTRACT ALGEBRA STORY 239 


experienced by the students lies not so much in the complexity of the theorem, as in 
the abstract nature of the mathematical objects involved. 


IR: Can you give an example? 


Yes, let’s look at the relationships involved in Lagrange’s theorem and its proof: 
one number dividing the other, two sets having the same cardinality or being 
disjoint, etc. These are easy enough. The objects, on the other hand, are “com- 
plicated” in the sense of the many levels of abstraction and the great time and 
effort needed to “construct” them in the mind of the student. For us, the 
simplicity of the proof lies in our ability to have a clear image of the group as being 
partitioned into a disjoint union of cosets. But in order for the students to have 
such an image, they need to “construct” in their mind not only “group’’, “‘sub- 
group” and “‘coset” but also “the set of all cosets of H in G”’. 

One powerful aspect of the computer environment we have been describing is 
that constructing these new entities on the computer helps students see them as 
real things that have their own properties, and that can be manipulated, investt- 
gated and discussed. 


IR: How can you tell that your method really makes a difference? 


The fact is that students in our classes don’t feel the same alienation and paralysis 
in the face of the quotient group, and end up with a good understanding of it. This 
is borne out in the research that accompanies our teaching, both in listening to the 
students’ own subjective evaluation of the method and their learning in it, and by 
in-depth interviews of many individual students. At the very least, they have little 
difficulty, even on exams, in constructing specific quotients, for example in S,, and 
identifying the resulting group as being isomorphic to some known groups. Our 
data suggests that this understanding tends to carry over into ring theory where 
they can learn to construct and analyze quotients of polynomial rings by various 
ideals. 


IR: How can you explain the improvement in students’ understanding ? 


We think it is due to our method which is based on modern understanding of how 
people learn. This involves a dramatic change in the basic assumptions about how 
learning occurs: More and more people in our profession are beginning to embrace 
the notion that knowledge is not transferred from one person (the instructor) to 
another (the student), but rather is constructed in the learner’s mind. This mental 
construction takes time and effort and requires an appropriate learning environ- 
ment the design of which is informed by research into how mathematics is learned. 


IR: Exactly how do you use research here? 


The role of research in this context is to propose specific mental constructions that 
students can make in order to learn specific mathematical concepts, and to 
propose and evaluate methods for fostering such constructions. For example, to go 
into a little more detail on a point that was discussed earlier, our research suggests 
that students’ difficulty with understanding Lagrange’s theorem may be largely due 
to their confusion about the nature of cosets. We find that they can understand the 
process of forming a coset, but often cannot take the next step of seeing these 
cosets as objects to be measured, counted and compared. Thus, we design 
computer activities aimed at getting students to construct cosets on the computer 
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and then treat them as objects by manipulating them in various ways. We have 
done this for a number of mathematical concepts and this work forms a major 
component of our efforts. We are beginning to develop a body of literature about 
it which the reader can consult (Ayres et al. 1986; Breidenbach et al, 1991; 
Dubinsky, 1986, 1989; Dubinsky, Dautermann, Leron and Zazkis, 1994; Dubinsky, 
Elterman and Gong, 1989; Dubinsky and Leron, 1994). 


CONCLUSION. In this paper we have attempted to describe a new paradigm for 
teaching undergraduate mathematics in general and abstract algebra in particular. 
We have started out from what we perceive as a general discontent by instructors 
and students alike with the state of teaching abstract algebra, and have marshaled 
all the resources at our disposal to develop this new paradigm. These resources 
include contemporary theory and research into the mental processes involved in 
learning mathematics, collaborative learning methods, the use of computers 
equipped with a custom-made programming language, and our own experience 
over may years of teaching undergraduate mathematics. 

These methods are not restricted to Abstract Algebra but have also been 
applied to courses in Discrete Mathematics, Precalculus and Calculus. 

We have presented here a “‘constructivist”’ approach, according to which telling 
students about mathematical processes, objects and relations is not sufficient to 
induce meaningful learning (hence the sorry state of affairs even with the best of 
lecturers). What is required, rather, is a learning environment which encourages 
and enables students to make mental constructions corresponding to these mathe- 
matical processes, objects and relations. Forming such learning environments is a 
highly non-trivial educational task and, in our opinion, the best way of inducing 
such mental constructions is by having students make appropriate constructions on 
a computer, and by using the social context to reflect on the computer activities. 
Our main tool for making constructions on the computer has been programming in 
ISETL, while our main tool for reflecting on the activities has been collaborative 
learning. Along with the computer activities and the accompanying discussions in 
teams, we of course also use the standard tools of class discussion, short summaries 
by the instructor and homework assignments. However, it is our belief that these 
tools are most effective when applied after the activities. 

Making the transition to teaching with this constructivist, interactive method is 
not easy. In addition to making fundamental changes in long-held attitudes about 
teaching and learning, the instructor must make a substantial initial investment of 
time and energy for learning techniques very different from the standard lecture 
method. But our experience and research is showing that this extra effort on the 
part of the instructor is extremely well rewarded by the change in students’ 
attitudes towards the course and mathematics in general, and by the amount of 
meaningful learning which is achieved. 
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“I think you’re begging the question,” said 
Haydock, “‘and I can see looming ahead one of 
those terrible exercises in probability where six 
men have white hats and six men have black 
hats and you have to work it out by mathemat- 
ics how likely it is that the hats will get mixed 
up and in what proportion. If you start thinking 
about things like that, you would go round the 
bend. Let me assure you of that!” 


—Agatha Christie 
The Mirror Crack’d. Toronoto: Bantam Books, 1962. 


Answer to Picture Puzzle 
(p. 226) 


André Weil 
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A Multidimensional Version 
of Rolle’s Theorem 


Massimo Furi and Mario Martelli 


In this paper we obtain for functions f: R” — R? a version of Rolle’s Theorem 
which we hope the readers will find useful and interesting for the following 
reasons. Three fundamental results from Calculus: namely Rolle’s Theorem, the 
Mean Value Theorem and the Cauchy Generalized Mean Value Theorem can be 
easily derived from it. The version has intuitive geometrical applications and the 
proof is very simple. 

Teachers may find it appropriate to incorporate our result in a course on 
Multivariable Calculus, since it provides an example of how certain one-dimen- 
sional theorems can be rephrased in higher dimensional spaces, and it shows that 
by expanding our mathematical horizon we frequently gain in organization and 
unity. Professional mathematicians are all familiar with these facts, but students 
will surely derive from them a motivation to learn more. 

The basic idea of our result is to assume a certain behavior of f on the 
boundary OR of a n-dimensional region R (in the real line this behavior reduces to 
the familiar condition f(a) = f(b)) to obtain information on the derivative of f at 
an interior point of R. Of particular relevance to the result is the Mean Value 
Theorem of Sanderson [10] for a function v: [a,b] —~ R”. We extend his theorem 
to functions of several variables. 

The paper ends with an additional, more general version of Rolle’s Theorem, 
and with an open problem and a conjecture which will hopefully stimulate the 
reader’s mathematical curiosity. 

We now list the terminology used and the results needed in the sequel. 
0(m Xn) stands for the zero matrix with m rows and n columns. x - y denotes the 
Euclidean inner product between x and y and the norm of x is ||x|| = Vx - x. We 
repeatedly make reference to the following sets: 


D(x, r) = {x € R": |lx — xoll <r}, B(x, 7) = {x © R": |lx — xoll < r}, 
and S(x,,r) = {x € R": |lx — xgll =r} = 9D(xo,r). 


The two propositions below play a key role in the proof of our multi-dimen- 
sional version of Rolle’s Theorem. 


Propostion 1. Let f: D(x), r) C R” > R and let ¢ € B(xXo, r) be an extremum point 
of f. Assume that f is differentiable at c. Then f'(c) = 001 X n). 


Proposition 2. Let f: D(x), r) C R” > R be continuous. Then the image of f is a 
closed and bounded interval [m, M]. 
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We point out that the proof of Rolle’s Theorem in R is based on the one-dimen- 
sional version of the two propositions. 


Results. The following simple example shows that a straightforward reformulation 
of Rolle’s Theorem in R”, n > 2, fails. 


Example 1. Let f: R? > R? be defined by 
f(x,y) = (x(%* + y*—- 1), y(x* +y*—1)). 


The function f is continuous on D(0, 1), is differentiable on B(O,1) and f(x) = 0 
for every x € S(0, 1). However, f’(x) # 0(2 X 2) for all x € B(O, 1). 
We are now ready to state and prove our main result. 


Theorem 1. Let f: D(x),r) C R” > R®” be continuous on D(xo, r) and differen- 
tiable on B(x),1). Assume that there exists a vector v © R” such that 


i) v is orthogonal to f(x) for every x © S(Xq,1r). 


Then there exists a vector c € B(Xy, r) such that v - f'(c)u = 0 for every u € R". 


Proof: Let k: R? — R be defined by k(x) = v- x. Set g(x) = k(f(x)). By Proposi- 
tion 2 the image of g is a bounded and closed interval [m, M]. Assumption i) 
implies that g is 0 on S(x,, r). Hence we may assume, without loss of generality, 
that g reaches its maximum value, M, at a point c € B(xo, r), namely M = g(c). 
By Proposition 1 g’(c) = 001 X n), i.e. v - f’(c)u = 0 for every u € R”. QED. 


Remark I. Assumption i) can be replaced by the equivalent statement 
“i1) v + f(x) is constant on S(X9,1r)”; 


and the conclusion of the theorem can be expressed in the equivalent but 
geometrically more intuitive way 


af af 


ax. =—(c),..., 5 be) 


n 


af 
“vy is orthogonal to the vectors = —— 1 6) 


Remark 2. D(xo,r) can be replaced by the closure of any open, bounded and 
connected set R of R”. 


Rolle’s Theorem, the Mean Value Theorem and the Cauchy Generalized Mean 
Value Theorem are easily derived from Theorem 1. 


Corollary 1 (Cauchy). Let a < b and f, g: [a,b] — R be continuous on [a, b] and 
differentiable on (a, b). Then there exists c € (a, b) such that 


L f(b) — flay] 8'(c) =f'(e)La() — 8(a)]. 


Proof: If f(a) = f(b) and g(a) = g(b) there is nothing to prove. 

Assume [ f(b) — f(a) + [g(b) — g(a)’ > 0. Define S: [a,b] ~ R* by S(t) = 
(g(t), f()). Let v = (f(b) — f(a), g(a) — g(b)). Then v: T(a) =v: T(b) = 
f(b)g(a) — f(a)g(b). Hence, according to Theorem 1 (see Remark 1), there is a 
point c € (a,b) such that v- T’(c)t = 0 for every t © R. With t # 0 we obtain 
[f(b) — fl@le’(c) = fo g(b) — gad]. 
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Setting g(x) =x gives the Mean Value Theorem. If, in addition f(b) = f(a), 
then we have Rolle’s Theorem. QED 


The next Corollary is the Mean Value Theorem of Sanderson [10] mentioned in 
the Introduction. 


Corollary 2. Let a < b and vy: [a,b] > R?” be k times differentiable. Assume that 
v(a), v(b) and the k — 1 derivatives of v at a are orthogonal to a non-zero vector Vo. 
Then, for some c € (a, b), vc) is orthogonal to Vp. 


Proof: From Theorem 1 we derive the existence of a point c, € (a, b) such that vy 
is orthogonal to v'(c,). The theorem can now be applied to v’ in the interval [a, c,] 
to yield a point c, < c, such that vy is orthogonal to v’(c,). This procedure can be 
repeated k — 1 times to obtain c = c, < c,_, such that vy - vc) = 0. QED 


A recent result of Evard and Jafari [4] (see also [7]) follows from Theorem 1. 


Corollary 3. Let C be the field of complex numbers and f: C — C be a holomorphic 
function. Assume that there are points a # b such that f(a) = f(b). Then there exist 
Z1,Z in the open line segment joining a with b such that Re(f'(z,)) = Im(f’(z,)) = 
0. 


Proof: Let f(z) = f(x + iy) = u(x, y) + v(x, y) and pe R’, p =(p,, p,) = 
(Re(a), Im(a)), q € R*, q = (q,, gd) = (Re(b), Im(b)). Define g(t) = (u(q + t(p - 
q)), v(q + t(p — q))), t € [0,1]. Notice that g(0) = g(1). According to Theorem 1, 
for every x € R’, x # 0, there exists t, € (0,1) such that x - g’(t,)t = 0, for every 
t © R. Let t = 1 and choose the vector x, = (p, — @), Pp» — q>). Then 


Ou 5 Ou 
0=x,-g'(t)) = 5, (8 (40) Pr —4q) + jy (8 (40) Pr — 41)(P2 - 42) 


OD OD 5 
+ 5 (8(to) (Pi ~ 41) (P2 ~ 42) + jy (8(t0) )( P2 — >)’. 


Since f is holomorphic, its real and imaginary part satisfy the Cauchy-Riemann 
equations (see [1]), ie. du/dx = dv/dy and du/dy = —dvU/dx. Hence 


Ou ; ; 
=~ (8(t0)) (rr —4q,) + (Pp. - 4) — 0. 


This implies du /dx(g(t))) = Re(f'(z,)) = 0, where z, = q + to(p — q). 
To obtain the other equality use the vector x, = (q, — p>, p, — q,). QED 


Theorem 1 can be given a slightly more general form. 


Theorem 2. (Second version of Rolle’s Theorem in R"). Let f: D(xy, r) C R" > R? 
be continuous on D(x,, r) and differentiable on B(xy, r). Let v © R”, Z) € BX, 1) 
be such that 


ii) v - (f(x) — f(Z)) does not change sign on S(X9,1). 


Then there exists a vector ¢ € B(x),1r) such that v-: f'(c)u = 0 for every u € R". 
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Proof: We may assume, without loss of generality, that v - (f(x) — f(z,)) < 0 for 
all x € S(x,,r). This implies the existence of a point c € B(xy,r) such that 
v- f(c) = M, where M = max{v - f(x): x € D(xp, r)}. Consequently, v - f’(c)u = 0 
for allu € R”. QED 


Remark 3. In the case when n = p = 1 Theorem 2 says that if for some z € (a, b) 
we have 


j) either f(z) = max{ f(a), f(b)} ii) or f(z) < min{ f(a), f(d)}, 
then there exists c € (a,b) such that f’(c) = 0. Notice that every z € (a, b) 
satisfies either j) or jj) when f(a) = f(D). 

The following result (see Boas [3]) is an easy consequence of the above remark. 


Corollary 4. Let a < b and f: [a,b] > R be continuous on (a, b| and differentiable 
on [a, b]. Assume that f'(a) = f’(b). Then there exists a point c € (a, b) such that 


fi(e)(c — a) = f(c) — fla). 


Proof: A straightforward computation shows that Corollary 4 is true for f if and 
only if it is true for g(x) = f(x) — xf’(a). Therefore we may assume, without loss 
of generality, that f’(a) = f’(b) = 0. Define 


f(x) — f(4) 
h(x) = x—a 
0 x=a 


The function h is continuous on [a,b], differentiable on (a,b] and h’'(b) = 
—h(b)/(b — a). 

Assume that h(b) # 0. From h(b)h’(b) < 0 and h(a) = 0 we derive the exis- 
tence of z € (a,b) which satisfies either i) or ii). In the case when h(b) = 0 
(= h(a)) every point z € (a, b) will do the job. Hence, by Theorem 2 (Remark 3), 
there exists c € (a, b) such that h’(c) = 0, and this implies the stated result. QED 


Geometrical Applications of Theorem 1 and Theorem 2. We present three geomet- 
rical applications. To allow for a visual representation of the results we do not 
state them in their full generality. 


Application 1. Let f: D(0,1) Cc R* > R°, f(u,v) = (xy, v), y(u, v), z(u, v)) be 
continuous on D(0, 1) and differentiable on B(O, 1) and let G = Imf. Assume that 
there exists a plane p: ax + by + cz +d =0, such that (x(u, v), y(u, v), z(u, v)) 
& p for every (u,v) € S(O, 1). Then there is a point (u,, v,) € B(O, 1) such that the 
tangent plane to the surface G at the point f(uo, v)) is parallel to p. 


Justification. By Theorem 1 (see Remark 1) the vector v, = (a, b, c) is orthogonal 
to 


0 0 
uo) =p and ru) = q, 


for some uy € BO, 1), uy = (Uo, Vp). The tangent plane to G at f(ug) is {f(ug) + 
mp + nq: m,n € R}, which is obviously parallel to p. 


Application 2. Let f: D(0,1) c R* > R°, f(u,v) = (x(u, v), y(u, v), z(u, v)) be 
continuous on D(0,1) and differentiable on B(0,1). Denote by G the surface 
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G = Imf and let G, = f(aD(0, 1)). Assume that there is a plane p: ax + by + cz + 
d = 0, such that Gp is on one side of p and there is a point of S on the other side 
of p. Then the tangent plane to G at some point P € S is parallel to p. 


Justification. Let u; = (u,,v;) € B(O, 1) be such that f(u,) is on the other side of p 
with respect to Gy. Then (a, b,c) - (f(u) — f(u,)) does not change sign on dD(O, 1). 
The conclusion follows from Theorem 2. 


We illustrate this situation with an example 


Example 2. Let f(u,v) = (u? + v* — u,u* + v,u? — v). Then G, = {1 — u, u* + 
v,u* — v): u* + v* = 1} and f(0,0) = (0,0, 0) are on opposite sides of the plane 
p: x +y+z=1/2. Hence there is a point P on G = Imf where the tangent 
plane is parallel to p. The point is P = f(1 /6, 0). 


Application 3. Let x: [a,b] — R* be continuous on [a,b] and differentiable on 
[a,b], and let P = x(a) = (x(a), y(a), z(a)), O = x(b) = (x(b), y(b), z(b)). Then 
for every plane p passing through the line L joining P with Q there is a point 
c € (a, b) such that the vector x’(c) is parallel to p. In particular, when the plane 
p is the one containing the origin, we obtain that x’(c) satisfies the equality 


i) x(a)[y(b)z'(c) — z(b)y'(c)] + y(a)[ z(b)x'(c) — x(b)z'(c)] 
+ z(a)[x(b)y'(c) — y(b)x'(c)] = 0. 


Justification. The first part is an immediate consequence of Theorem 1, since for 
every plane passing through L there is a vector u orthogonal to L and to the 
plane. For the second part observe that the direction v of a line orthogonal to p is 
given by the cross product of the two vectors x(a) and x(b), i.e. v = x(a) X x(b). 
Thus there exists c € (a, b) such that x(a) X x(b) - x’(c) = 0, which implies i). For 
a different justification of the result presented in Application 3 see [2]. 


Open problem and conjecture. We conclude the paper with an open problem and 
a conjecture. Theorem 1 and Theorem 2 remain valid if R? is replaced by a 
Hilbert space H. No changes are needed in the proof. They are also true when R” 
is replaced by a Banach space F with the vector v substituted by a linear 
continuous functional @. 

We conjecture that the theorems are false when R” is replaced by an infinite- 
dimensional Banach space E, because Proposition 2, which plays a key role in both 
proofs, fails in E. In fact, the unit closed ball D(0,1) of E is not compact. 
Consequently, there exists continuous functions f: D(0,1) — R such that Imf is an 
open interval, as illustrated by the following example. 


Example 3. Let H be the Hilbert space of square summable sequences of real 
numbers and let D be the disk of H centered at the origin and with radius 1, 
D = D(0, 1). Define 

T: D > H, T(x) = T(x, ¥,...) = (V1 = Ix? 1, 5,...). 
The map T does not have any fixed point on D. In fact, since ||T(x)|| = 1 for all 


x € D, every potential fixed point x must be located on the boundary of D, i.e. x is 
fixed for T only if ||x|| = 1. This implies T(x) = (0, x,,...). Combining this result 
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with the equality T(x) =x gives x = 0, against the assumption ||x|| = 1. The 
fixed-point free map T allows us to define the continuous function 


D> — 
f: DR, f(x) ix Tool 
Let us show that the image of f is the open half-line (0.5, ~). 

We already know that |lx — 7(x)|| > 0 for every x © D. To verify that the 
greatest lower bound (glb) of {\lx — T(x)||: x € D} is 0 consider the elements 
x,, <= D(O, 1) whose entries after the n position are all 0, while the first n are all 
equal to 1/ vn: 


1 1 1 1 1 
n Vn ’ Vn ’ Vn ’ Vn 92999 Vn ’ 
Clearly ||x,,|| = 1 and |Ix, — 7(x,,)|| = ¥2/n. Hence the greatest lower bound is 0. 


To see that ||x — T(x)(|| < 2 for every x € D, notice that ||x — T(x)|| = 2 re- 
quires ||x|| = 1 and x = —7 (x), ie. 


x O,... 


(X,,%5,--.) = (0, —x,, —X5,...). 


The above equality implies x = 0, a contradiction with ||x|| = 1. To verify that the 
least upper bound (lub) of {||x — T(x)||: x € D} is 2 consider the elements y, whose 
entries after the m position are all 0, while the first nm are alternatively equal to 
+1/ vn: 


1 4.1.1 veer tg 
no vn’ vn’ vn’ To! ) Vn” geese 


Then ||x, — T(x,,)|l = ¥4 — (2/n) , which implies that the least upper bound is 2. 
Hence the image of f is the open half-line (0.5, ©). 

It would be nice to have an example which shows that Theorems 1 and 2 fail in 
infinite dimension. So far we have been unable to construct it. 

In the References we mention other contributions (see [5] pg. 19, [6], [8], [9], 
[11]) regarding Rolle’s Theorem, the Mean Value Theorem and the Cauchy 
Generalized Mean Value Theorem. They are not directly related to this paper, but 
the reader may find them useful to get a better overview of the work done in this 
area. 


X 
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Reply to CD’s 


“These are indeed exciting times in the world of Mathematics.” I would like to respond to the “Fale 
of Two CD’s” by Dan Kennedy. “The winds of change are blowing through ... the curriculum” and 
some of us feel like the French citizens in the late 1930’s that we might be better off without some of 
the coming changes. I am a practicing mathematician of a dozen years experience writing simulations, 
optimizations, and analyses in wireless and landline telephony, printed circuit board production, airline 
fleet assignment, yield management, and maintendnce delays. I also have considerable exposure to 
Mathematics education as consumer and producer. 

It is apt that he chooses the compact (CD) as his analog (pun intended) for the newest New Math. 
The CD is truly a triumph of marketing over technology. It is quiet and cute, shiny and high-tech. If the 
medium were truly digital, then the sound wouldn't be dramatically altered by putting a rubber mat on 
top, painting the rim green. or reversing the prongs of the AC cord. By stuffing thousands of dollars of 
digital signal processors (DSPs) into the signal path, clever engineérs have surpassed cheap turntables 
to the point where the best $10,000 CD players outperform $1000 turntables. But, of course, you're 
listening to the DSPs rather than the CD. 

Those of us who keep concert seats year after year in spite of the surface noise (audience rustling) 
and clicks and pops (coughs and sneezes) tend also to find ourselves labelled as ‘‘collectors”” and 
‘“‘Luddites” as we continue to purchase records. [ have over two thousand phonograph records and a 
Linn, LOCI, and EK-1 to play them. The huge advantage of CD over record is the low manufacturing 
cost which should have brought the consumer cheap recordings, but somchow this never happened. 

The educational analogy td ‘‘compact” sound is a simplified curriculm relying on technology to 
replace the drudgery of traditional teaching methods. We are offering better high school mathematics 
programs than before, alas, to college students and, occasionally, to graudate students. Reducing 
student involvement in math courses has failed to attract better or more motivated students to our 
classrooms; did we really expect it to do so? 

In our Brave New World (Aldous Huxley, 1932) of post-Modern education, the emphasis is on 
maintaining the students’ willingness to enroll in our courses and come to our classes. We must 
entertain them and we mustn’t scare them away so machines do their “timeses and gazintas” and solve 
equations for them and invert matrices for them and even graph functions for them. Being able to 
balance a checkbook without a machine is A Sense of Power (Asimov, 1957) in today’s Mathematics 
classroom. 

Mathematics is not a spectator sport; we learn it by doing it. While my Linear Programming students 
this fall will learn to use AMPL modeling language, they also will graph polytopes and crank out 
Simplex optimizations by hand. 

Do I suppose Newton would be flattered to see our students walking a road to discovery essentially 
the same as his? J certainly do. I know I’m flattered to see my own discovery process (including my 
software) used ten years later to teach new students in cellular mobile telephone system engincering as 
its success in pedegogy affirms my own confidence in my knowledge. Isaac Newton’s results are 
certainly more than thirty times as worthy of posterity as mine and Calculus and Physics students 
should see them his way. 

Our Mathematics education and curriculum certainly could use a dose of enthusiasm and support 
from both teacher and students, but I doubt it requires much revision. Computational and display tools 
can enhance and deepen our insights and our delight, but we must remember that students learn by 
traveling the road to discovery with their own eyes, ears, and limbs and not by watching machines (or 
professors) do it for them. 


Adam N. Rosenberg 

14061 Oakgreen Circle South 
Afton, MN 55001 

Adam@ Psionic.mn.org 
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NOTES 


Edited by: John Duncan 


Adding Distinct Congruence Classes 
Modulo a Prime 


Noga Alon, Melvyn B. Nathanson and Imre Ruzsa 


1. THE ERDOS-HEILBRONN CONJECTURE. The Cauchy-Davenport theorem 
[3, 4] states that if A and B are nonempty sets of congruence classes modulo a 
prime p, and if |A| = k and |B| = /, then the sumset 


A+B={a+blaeA,beB} 


contains at least min( p, k + / — 1) congruence classes. It follows that the sumset 
A +A contains at least min(p,2k — 1) congruence classes. Erdés and Heilbronn 
conjectured 30 years ago that there are at least min( p, 2k — 3) congruence classes 
that can be written as the sum of two distinct elements of A. Erdés has frequently 
mentioned this problem in his lectures and papers (for example, Erdéds-Graham 
[6, p. 95]). Applying results from exterior algebra and the representation theory of 
the symmetric group, Dias de Silva and Hamidoune [5] recently proved this 
conjecture. The purpose of this paper is to give a simple proof of the Erddés- 
Heilbronn conjecture that uses only the most elementary properties of polynomi- 
als. The method, in fact, yields generalizations of both the Erddés-Heilbronn 
conjecture and the Cauchy-Davenport theorem. 


2. THE POLYNOMIAL METHOD 


Lemma 1 (Alon-Tarsi [2]). Let A and B be nonempty subsets of a field F with 
|A| =k and |B| = 1. Let f(x, y) be a polynomial with coefficients in F and of degree 
at most k —1inx and|l—1 in y. If f(a,b) =0 for alla © A and b €B, then 
f(x, y) is identically zero. 


Proof: This follows immediately from the fact that a nonzero polynomial p(x) € 
F[x] of degree at most k — 1 cannot have k distinct roots in F. We can write 
k-11-1 ka | 
f(x,y) = > > fi, jx’y’ = > vi(y) x’, 
i=0 


i=0 j=0 


where 


[-1 
U(y) = Dw ij? 
j=-0 
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is a polynomial of degree at most / — 1 in y. Fix b € B. Then 
k-1 | 
u(x) = f(x,b) = )) v(b)x' 
i=0 


is a polynomial of degree at most k — 1 in x such that u(a) = 0 for all a EA. 
Since u(x) has at least k distinct roots, it follows that u(x) is the zero polynomial, 
and so u,(b) = 0 for all b € B. Since deg(v,) < 1 — 1 and |B| = J, it follows that 
vy) is the zero polynomial, and so f;,; = 9 for all i and j. This completes the 
proof. O 


Lemma 2. Let A be a finite subset of a field F, and let |A| =k. For every m = 0 
there exists a polynomial r,,(x) € F[x] of degree at most k — 1 such that 


rp @) = a” 
for alla € A. 


Proof: We shall give two proofs. Let A = {d),a@,,...,@,_,}. We must show that 
there exists a polynomial r,(x) = z) + z;x + °** +z,_,x*~' © F[x] such that 


k-1 


_ 2 Lae — 7m 
VA 4;) =Zo + Za; + Z,a; + +Z,_ 1a: = a; 


for 1 = 0,1,...,k — 1. This is a system of k linear equations in the k unknowns 
Zo) Z15---»Z,—1, and it has a solution if the determinant of the coefficients of the 
unknowns is nonzero. The Lemma follows immediately from the observation that 
this determinant is the Vandermonde determinant 


2 wae k-1 
1 a ao a 

2 wae k-1 
1 a, ay a 7 

= I] (a, -a,) #0. 
; O0<i<j<k-1 
2 _— k-1 

re ax-1 


The second proof is even simpler. Let 


k-1 
t(x) = [T(x —4a,). 


Then t(x) € F[x] and deg(t) = k. By the division algorithm for polynomials over a 
field, for every m = 0 there exist polynomials q,,(x) and r,,(x) such that deg(7,,) 
< k — 1 and 


x™ = t(x)q,,(x) +7,,(x)- 
Then 
a; — t(a;)4,,( 4;) + Vn 4;) — r'mn( 4;) 


for all a; © A. This completes the proof. O 


Theorem 1. Let p be a prime number, and let F = Z/pZ. Let A and B be nonempty 
subsets of the field F such that |A| # |B|. Let 


C={at+blaecA,beB,a#b}. 
Then 
|C| > min( p, |A| + |B] — 2). 
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Proof: Let |A| = k and |B| = 1. We can assume that 


l<l<k<p. 
Ifk +1-—2>p, let l'’=p—k+4+ 2. Then 

2<I'<l<k 
and 

K+l'-—2=p. 


Choose B’ c B such that |B’| = /', and let 
C'={a+b'la A, b' ©B',a #b. 
Then C’ Cc C. If the Theorem holds for the sets A, B’, and C’, then 
IC) =|C’| >k+l'’—2=p=min(p, |A| + |BI - 2). 


Therefore, we can assume that 


k+1—2 <p. 
We must prove that 

IC] >k+1-2. 
Suppose that 

IC] <k +1-3. 


Choose w so that 
w+ |Cl =kK+1-3. 
We construct the polynomial f(x, y) in F[x, y] as follows: Let 


f(x,y) = Gm yaty IG ty —c). 


Then f has total degree exactly k + / — 2, and 
f(a,b) =0 foralla EA,DEB. 


Moreover, 


f(x,y) > f;, -x'y! 
i,jz20 
itj<k+l—2 


(x—y)(x+y) 


Since 1 <1 <k <p and1<k+1-3 <p —1, it follows that the coefficient 


f—1,)-1 Of the monomial x*~'y'~! in f(x, y) is 


k+1-3 k+1-3\. (K-I)(k+1—3)! 
| k—2 }-| k-1 }-GScpr *! (mod p). 


k+1-3 


+ lower order terms. 


By Lemma 2, for every m = k there exists a polynomial r,,(x) of degree at most 
k —1 such that r,(a)=a™ for all a € A, and for every n >/ there exists a 
polynomial s,(y) of degree at most / — 1 such that s,(b) = b” for all b © B. We 
use the polynomials r,,(x) and s,(y) to construct a new polynomial f*(x, y) from 
f(x, y) as follows: If x”y” is a monomial in f(x, y) with m > k, then we replace 
x™y" with r,(x)y”. Since deg(f) =k +1 -— 2, it follows that if m>k, then 
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n<i-—2, and so r,(x)y” is a sum of monomials x‘y’ with i<k-—1 and 
j <1l-— 2. Similarly, if x”y” is a monomial in f(x, y) with n > 1, then we replace 
x™y" with xs (y). If n=l, then m<k—2, and so x™s,(y) is a sum of 
monomials x‘y/ with i < k — 2 and j <1 — 1. This determines a new polynomial 


f*(x, y) of degree at most k — 1 in x and / — 1 in y. The process of constructing 


f*(x, y) from f(x, y) does not alter the coefficient f,_, ,_, of the term x*~'y'~! 


since this monomial does not occur in any of the polynomials r,Ax)y” or xs ‘(y). 
On the other hand, 


f*(a,b) = f(a,b) =0 
for all a © A and b © B. It follows immediately from Lemma 1 that the polyno- 


mial f*(x, y) is identical zero. This contradicts the fact that the coefficient 


fe-11-1 of x*~'y’~! in f*(x, y) is nonzero, and completes the proof. O 


Theorem 2 (Dias da Silva-Hamidoune). Let p be a prime number, and let F = Z./pZ. 
Let A CF, and let |A| = k = 2. Let 2”A denote the set of all sums of two distinct 
elements of A. Then 


|2°A| > min( p,2k — 3). 


Proof: Let A CF, |A| => 2. Choose a € A, and let B =A \ {a}. Then |B| = 
|A| — 1. Let 


C={at+blacA,beB,a ¥b}. 
Then 2° A DC and, by Theorem 1, 
|2°A| > |C| > min( p, |A| + |B| — 2) = min( p,2|A| — 3). 


This completes the proof of the Erdés-Heilbronn conjecture. O 


Let K+1—-—2 <p, 1<l<k<p. Let A= ({0,1,2,...,k —1} and B= 
{0,1,2,...,/ — 1}. Then 


C={at+bla€A,b€B,a #b} = {1,2,...,k +1] —- 2} 
and 
2°A = {1,2,...,2k — 3}. 
This example shows that the lower bounds in Theorem 1 and Theorem 2 are sharp. 
3. FURTHER APPLICATIONS OF THE METHOD. The polynomial method is a 


powerful new technique to obtain results in additive number theory. For example, 
it gives the following simple proof of the Cauchy-Davenport theorem. 


Theorem 3 (Cauchy-Davenport). Let p be a prime number, and let F = Z/pZ. Let 
A and B be nonempty subsets of the field F, and let 
C=A+B={a+blacA,beB}. 
Then 
|C| > min( p, |A| + |B| — 1). 
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Proof: Let |A| =k and |B| =1. We can assume that k +/-—1 <p. If |C| < 
k + 1 — 2, choose w so that 


w+ /(C]l =k+1-2, 
and consider the polynomial 
f(x,y) = (2 +9)" T(x ty 0). 
Then f(a,b)=0 for all a€@ A and b €B. The polynomial has total degree 
k + 1— 2, and the coefficient of the monomial x*~!y’~! is exactly 


(het? 
k-1 


The proof proceeds exactly as the proof of Theorem 1. O 


#0 (mod p). 


As a final example of the method, we state and prove the following new result. 


Theorem 4. Let A and B be nonempty subsets of F = Z/pZ, and let 
C={a+bla €A,b €B, ab # 1}. 
Let |A| = k and |B| = 1. Then 
IC| => min(p,k +/]— 3). 


Proof: If k +1—3>p, let l’=p—k +3. Then 3 <I’ <1. Choose B’ CB such 
that |B’| = l/' and let 


C’={a+b'la €A,b' —B’,ab' #1}. 


Since C' CC, it suffices to prove that |C’| > k+/' — 3. Equivalently, we can 
assume that k + 1 — 3 < p, and we must prove that |C| > k + 1 — 3. 
Suppose that |C| < k + 1 — 4. Choose w so that 


w+ (Cl =kK+1-4, 


and consider the polynomial 
f(x,y) =(4y-D(ety) [T(x +y~—c). 


Then f(a,b)=0 for all a€@ A and b€B. The polynomial has total degree 


k + 1— 2, and the coefficient of the monomial x*~!y!~! is 


oe 


a) | #0 (mod p). 


The proof continues exactly as the proof of Theorem 1. O 


Let k +1—1 <p, let k,/ >= 2, and choose d € Z/pZ, d # 0, such that 
(1+ (k —1)d)(1+ (/-1)d) =1. 


Let A = {1,1 + d,1 + 2d,...,1 + (k — 1)d} and B = {1,1 + d,1 + 2d,...,1+ 
(1 — 1)d}. Define C as in Theorem 4. Then C = {2 + id|i = 1,...,k +1 — 3}. 
This example shows that the lower bound in Theorem 4 is sharp for k, / > 2. If 
k = 1, the correct lower bound is |B| - 1 =k +1 -—-2. 
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4. REMARKS. The results in this paper hold for addition of finite subsets of any 
field F, where p denotes the characteristic of F if the characteristic is a prime 
number, and p = ~ if the characteristic is zero. 

Dias da Silva and Hamidoune [5] proved the following generalization of the 
Erdés-Heilbronn conjecture for h-fold sums: Let h > 2, and let h” A denote the 
set of all sums of h distinct elements of A. If A C Z/pZ and |A| = k, then 


|n* A| = min( p, hk — h? + 1). 


This result can also be proved by the polynomial method, and we shall present this 
and other results in a subsequent paper [1]. 

Nathanson [9] contains proofs of the Cauchy-Davenport theorem and some of 
its generalizations, as well as a full exposition of the original Dias da Silva- 
Hamidoune proof of the Erddés-Heilbronn conjecture for h-fold sums, and the 
polynomial proof. Partial results on the Erdés-Heilbronn conjecture had previously 
been obtained by Rickert [11], Mansfield [8], RGdseth [12], Pyber [10], and 
Freiman, Low, and Pitman [7]. 


REFERENCES 


1. N. Alon, M. B. Nathanson, and I. Z. Ruzsa. The polynomial method and restricted sums of 
congruence classes, J. Number Theory, to appear. 

N. Alon and M. Tarsi. Colorings and orientations of graphs. Combinatorica, 12:125—134, 1992. 
A. L. Cauchy. Recherches sur les nombres. J. Ecole polytech., 9:99-116, 1813. 

H. Davenport. On the addition of residue classes. J. London Math. Soc., 10:30—32, 1935. 

J. A. Dias da Silva and Y. O. Hamidoune. Cyclic spaces for Grassmann derivatives and additive 
theory. Bull. London Math. Soc., 26: 1994, page 140-146. 

6. P. Erdds and R. L. Graham. Old and New Problems and Results in Combinatorial Number Theory. 
L’Enseignement Mathématique, Geneva, 1980. 

7. G. A. Freiman, L. Low, and J. Pitman. The proof of Paul Erdés’ conjecture of the addition of 
different residue classes modulo prime number. In Structure Theory of Set Addition, 7-11 June 
1993, CIRM Marseille, pages 99-108, 1993. 

8. R. Mansfield. How many slopes in a polygon? Israel J. Math., 39:265-272, 1981. 

9. M. B. Nathanson. Additive Number Theory: 2. Inverse Theorems and the Geometry of Sumsets. 
Graduate Texts in Mathematics, Springer-Verlag, New York, 1995. 

10. L. Pyber. On the Erddés-Heilbronn conjecture. Personal communication. 

11. U.-W. Rickert. Uber eine Vermutung in der additiven Zahlentheorie. Ph.D. thesis, Tech. Univ. 
Braunschweig, 1976. 

12. O. J. Rédseth. Sums of distinct residues mod p. Acta Arith., 65:181-184, 1993. 


WR WN 


ALON: NATHANSON: 

Institute for Advanced Study Department of Mathematics 
Princeton, NJ 08540 Lehman College (CUNY) 
and Bronx, NY 10468 
Department of Mathematics nathansn(@dimacs.rutgers.edu 


Tel Aviv University 
Tel Aviv, Israel 
noga@math.tau.ac.il 


RUZSA: 
Mathematical Institute 
of the Hungarian Academy of Sciences 
Budapest, P.O.B. 127 
H-1364, HUNGARY 
h1140ruz(@ella.hu 


1995] NOTES 255 


A Simple Proof of the Holder 
and the Minkowski Inequality 


Lech Maligranda 


The proofs as well as the extensions, inverses and applications of the well-known 
Holder and Minkowski inequalities can be found in many books about real 
functions, analysis, functional analysis or L,-spaces (cf. [Mi]). The aim of this note 
is to give another proof of these classical inequalities. The following lemma will be 
a main step in our simple proof of these inequalities. This lemma was motivated by 
considerations in [KPS], [M] and [MP]. 


Lemma. For 1 < p < “and any a, b > 0, we have 


(i) inf * ye-tg +{1- 7% = qi/Pplh/P. 
t>O| p D 
(ii) inf [¢!-Pa? + (1-1)' "b?| =(a +b)”. 
O0<r<1 


First proof. In these proofs we will use calculus. 
(i) Let, for t > 0, the function f be defined by 


1 
1— —Jtl/”b. 


1 
f(t) =—-t'/? la t+ 
p p 


Then the derivative f’ satisfies 


1/1 1\1 1/1 
f(t) =—|— -1)0'72-7a + (1 - —|—0t'/2-'b = —| — - 1]t'/2-7(a — 2b), 
P\p Pp} P P\P 


and so f’ is negative for t <t, =a/b, zero for t = t, and positive for ¢t > fy. 
Hence, f has its minimum at the point t) = a/b and this minimum is equal to 


f (to) -4(5] = (5) a +]1- ~|(¢) = qi/Ppl-l/P, 


(ii) Let, for 0 < t < 1, the function g be defined by 
g(t) =t!-"a? + (1-1) bd. 
Then the derivative g’ satisfies the equation 
g(t) =(1—p)t’a? -(1-p)(1-t) “b? = 
only when t = t, = a/(a + b). Since 


g"(t) = (1 —p)(—p)tz?~'a? — (1 - p)(-p)(. - 1)? 'b? > 0, 
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it follows that g has its local minimum at t, = a/(a + b), which is equal to 


se -elta) ata “er sha) 


a+b a+b a+b 


a \i-»P b \'? > 
= Py bP =(a+b). 
ls) ¢ [—] (a +) 


This local minimum of the function g is equal to its global minimum because g is 


continuous on (0, 1) and lim, _, 5+ g(t) = lim, _,,- g(t) = +. 


Second proof. In these proofs we will use convexity of some functions. 


(i) The function g(u) = exp(u) is convex on R. Thus 


ai/Pp!—'/p _ [eel]? el/epy”” 


1 
ox] In(t'/?~'a) + 
Pp 


1 = = jince’*0) 
p pine 8) 


lA 


1 
- exp|In(11/?~4a)]| + 


t!/Pb 


1 1 
—t'/P-la+|1-—— 
Pp Pp 


for every t > 0. For t = a/b we have equality. 


(ii) The function &(u) = u”? for p > 1 is convex on [0, ©). Therefore, 


(a+b)° = 


a b |? 
t~+(1-¢t 
t ( TI 


a\P b ? 1- 
<1(=] +=9[5—] =t' "a? +(1-t) “bP 


for every 0 <t <1. For t=a/(a + b) we have equality. 


1 In(t'/? 
- = Jesplin( b)| 


Remark 1. If 0 < p < 1 and we change in the equalities (i) and (ii) the infimum 


into supremum, then our Lemma is still true. 


Remark 2. The second proof of (i) gives also a different proof of the arithmetic- 


geometric mean inequality 


1 
1—-— 
Dp 


(put ¢ = 1) as well as a different proof of the Young inequality 


1 
ql/Ppl-l/P < —q + b 
Dp 


ab < —a? + 
D 


1 «|e, 
p 


The classical Hdlder inequality states: Let 1 < p< and 1/p+1/q =1. If 


x EL (uy) andy © L,(w), then xy © Lu) and 


(HI) llxylli < llxllpllylla- 
Equivalently, if x, y € L(y), then Ix|'7? ly|'~ "7? © Lu) and 
(HI,) ely? < deli? lly 7”. 


1995] NOTES 


257 


Proof: According to our Lemma the inequality 


1 
ah/PBI—N/P < —ft'/P-la + 
Pp 


| 

1— —|t'/?b 
D 

holds for all ¢ > O and it follows that 


felt? = f lx(sy PP y(s) 11? dus) 


lA 


du(s) 


| ~] 
[Jceveix(sy|+ 


1 
1- ~ Jevalv(sy 
D 


= 11/0! f \x(s)|du(s) + (1 - eve | Ly(s)|du(s) 
D co) D a 


1 
—1t/P~N"xlly + 
Pp 


1 
1- — Jet 
Pp 


Taking the infimum over all ¢ > 0 and using our Lemma again we obtain 
1 1-1 1 - 
Pell, < ell? lly li '””, 


which proves inequality (HI, ). 


Remark 3. Our proof of (HI,) still works for a general Banach function space 
X() instead of the L,(w)-space, ie., if x, y © X(w), then lx |? ly [57 '7? © Xn) 
and 


(HI,) I ly Ee < acl lye 2. 


Equivalently (cf. [MP]), if |x|? © X(w) and ly|*? © X(w), 1/p + 1/q = 1, then 
xy © X(n) and 


(HI) lxylly <I bel? ly IZ 1/7. 


The classical Minkowski inequality states: Let 1 < p < ». Ifx, y& L,(w), then 
x+yEL (wp) and 


(MI) lx + yll, <Ilxllp + llyll,- 


Proof: By using the second part of our Lemma, i.e. the inequality 
(a+b)? <t!-"a? +(1-—1)' ?b? 


we find that for all t,0 <¢t < 1, 


|x +ylp = file(s) + ¥(s)P da(s) sf [lx(s)] +1 y(s) I]? dus) 


lA 


t'-P|x(s)P + (1-2)! ?ly(s) P] du(s) 
QO 


tip [leGyP du(s)+(1-t)'? [lye du(s) 


_ 1—- 
th? \|xllb +(1—t) *lylle. 
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Taking the infimum over 0 < ¢ < 1 and using our Lemma again we obtain 


lx + yll2 < (Ilxll, + llyll,)’, 
which is inequality (MI). 
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Never Too Late 


There is a slip in Williamson’s excellent article [2]. Although nearly 50 years have 
gone by it is never too late to restore the article to perfection. 

The 8 by 8 determinant displayed on page 433 has the value 44, instead of 56 as 
stated. 

From the material in the article that immediately precedes one can deduce that 56 
is attainably by appropriately bordering the incidence matrix of the seven point 
projective plane: 


0 1 
1 od 
1 0 
1 0 
1 1 
1 1 
1 O 
1 O 


moO RF OOF 
COD CRrR Be 
mm OOODH me 
m= OR OOrFRC., 


0 


Some followup thoughts arise at once but this is not the place to explore them. 
Years later Ehlich and Zeller [1] showed that 56 is the largest possible value for the 
determinant of an 8 by 8 matrix consisting entircly of 0’s and 1’s. 
1, H. Eblich and K. Zeller, Binadre Matrizen, Zcit Angew. Math. Mech. 42(1962), pages 20-21 of the 
Sonderhett. 


2. J. Williamson, Determinants whose elements are 0 and 1, Amer. Math. Monthly 53(1946), 
427-~-434, 
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Missing Real Numbers 


Christopher J. Van Wyk 


Authors of programs that implement mathematical algorithms soon confront the 
fact that computers do not have real arithmetic. Instead, most computers offer 
some form of floating-point arithmetic, which can be extremely fast, but is also a 
paltry substitute for real. Floating-point representation can be understood as 
scientific notation with a limited number of significant figures. Every floating-point 
system has a smallest « = 2~* such that 1 + « > 1; thus, (1 + 5) + 5= 1 while 
1+(5 + 5)=1+e>1, so floating-point addition is not even associative. Each 
floating-point arithmetic operation may commit a relative error of ¢; over a long 
Sequence of operations, such compounded errors can lead to wildly incorrect 
answers. An especially acute problem is catastrophic cancellation: the computed 
difference between two floating-point numbers that are nearly equal may include 
no significant figures at all; this makes it impossible in general to use floating-point 
arithmetic to test for exact equality [4]. Sometimes people are tempted to ignore all 
this and to use floating-point arithmetic as if it were real; numerical analysts have 
long railed against such ‘“‘naive” use of floating-point [2]. 

In computing with geometric objects, one promising approach that has emerged 
is to confine all (would-be) real arithmetic to a few functions and procedures. The 
rest of the program manipulates discrete information—whether logical, combina- 
torial, or topological—and calls on the arithmetic subprograms only when neces- 
sary. In effect, these subprograms behave as “black boxes’: various complicated 
and ungainly things may happen inside them, but the rest of the program need not 
be concerned about the details. Of course, this begs the question of how to write 
arithmetic subprograms that are robust and reliable. 

This column presents some exact-integer generalized predicates as examples of 
arithmetic subprograms. These predicates accept exact integers (usually user data) 
as input, compute the value of one or more integer polynomials in these inputs, 
and return as output a member of a small discrete set. For the first example, the 
set is {true, false}, so it is a predicate even in the ungeneralized sense. The 
computation of that example uses the generalized predicate sign ( ) that returns 
one of {+ 1,0, —1}, depending on the sign of its argument. 

Since the integer arithmetic native to computers also offers a limited number of 
digits of precision, we shall assume that some multiprecision integer arithmetic is 
used to compute the values of the polynomials when our integers outgrow the 
native precision. The many subroutine packages available to perform multipreci- 
sion integer arithmetic share at least one property: all run considerably slower than 
native integer or floating-point arithmetic. The slowdown worsens as the multi- 
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precision integers get longer; multiplication time, for example, grows quadratically 
with increasing bitlength. Thus, we shall account carefully for the bitlengths 
required by our computations. 

Our first example of a predicate takes two line segments given by their four 
endpoints, (a;, b;),(c;,d;), i = 0,1, and tells whether they intersect transversally 
(i.e., not at their endpoints). An obvious but naive approach is to compute the 
rational intersection point (p/w,q/w), where p, g, and w are integers, as the 
solution to the simultaneous system 


(c; — a;)(y — b;) = (d; — 5;)(x — 4,), i= 0,1; 


the line segments intersect transversally if and only if sign(p — wa,;) = sign(wc, — 
p) # 0 and sign(q — wb,) = sign(wd, — q) # 0 for i = 0, 1. In general, this compu- 
tation requires arithmetic in integers whose bitlength is at least three times that of 
the inputs a, through d,. 

A better approach is to test whether the endpoints of the first segment lie on 
opposite sides of the second segment, and vice versa. This amounts to testing 
whether 


sign C; d, lj = — sign C; d; 1| #0 (1) 
a,_; 56,; 1 Cj; a; 1 


for i = 0,1, which requires arithmetic on integers only about twice as long as the 
inputs. 

The determinant that appears in Eq. 1 of the line-segment intersection test is a 
special case of the “three-point orientation” predicate, which returns 1 if the 
points (x,, y,), (x2, yz), and (x3, y3) are oriented counterclockwise, — 1 if they are 
oriented clockwise, and 0 if they are collinear. This predicate can be implemented 
as 


xX, yy, 1 
sign|*2 Yy2 lj. (2) 
Xz yz | 


This is the only arithmetic primitive needed to compute the convex hull of a set of 
points, and it is useful in many algorithms on points and line segments [1]. 

Let us pause to consider just when Eq. 2 needs to be evaluated exactly. When 
the magnitude of the determinant is large enough, its sign can be read reliably 
from a floating-point approximation. When the magnitude is small, however, the 
floating-point approximation may have the wrong sign, and it is impossible to tell if 
the determinant is exactly zero. Recognizing the last condition is particularly 
important for computational geometry, since a zero determinant may well signal 
some kind of ‘‘degeneracy” in the input data; an example of degeneracy for Eq. 2 
is three collinear input points. 

To enable users to take advantage of some of these observations, thus getting 
the benefits of exact arithmetic without always having to pay the cost, Steve 
Fortune (AT & T Bell Laboratories) and I have implemented a program that 
generates arithmetic subprograms [3]. The user writes exact-integer predicates and 
specifies the precision of the input integers. Our program compiles each predicate 
into a C++ function that behaves as follows: first, evaluate the expression in 
floating-point arithmetic; return the sign if the expression is large enough that it is 
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known to be reliable; otherwise, evaluate the expression exactly in multiprecision 
integer arithmetic and return the sign. 

We have used our program to implement several geometric algorithms. When 
degeneracies are few (so that the floating-point approximation usually suffices), 
the resulting programs run almost as fast as programs that use naive floating-point 
arithmetic. When the input data contains many degeneracies (as it often does in 
real life), the programs run more slowly than corresponding naive floating-point 
versions, but they are correct: they do not crash ignominiously because conclusions 
drawn from approximate answers have led to geometrically or topologically impos- 
sible situations. 

So far, all of our experimental programs have manipulated “flat” objects like 
points, line segments, lines, and planes. To see how practical and convenient 
exact-integer arithmetic predicates would be for curved objects, we thought about 
implementing predicates on circles. This quickly led us to consider a generalization 
of the three-point orientation predicate to circles: Given four circles (x — h,)* + 
(y —k,)* =r7, i = 0,1,2,3, in what order do the six intersection points of circles 
1, 2, and 3 with circle 0 appear around that circle? 

The obvious approach is to compute the coordinates of the six points directly. 
As we shall see, the direct algebraic approach is no better an idea here than it was 
before, so we merely indicate the outlines of the derivation. Begin by computing 
the line through both of the points at which circle i and circle 0 intersect: 


a, = 2(h,; — ho) (3) 
b, = 2(k; — ko) (4) 
¢, <W + k2 +2 — he -k2 = (5) 
a,x +by+c;,=0 (6) 


Next, we can solve Eq. 6 for y in terms of x, substitute into the equation for circle 
0, and solve for the x-coordinates of the crossing points; in general, these will be 
quadratic irrationalities of the form (A + VB)/C. To express an arithmetic 
comparison between two such quadratic irrationalities as an integer polynomial, we 
need to rearrange extensively and square both sides twice (all the while accounting 
for signs). Evaluating the final integer inequality requires arithmetic in integers at 
least 20 times as long as the original h,, k,;, and r,. 

Another way to compute the three-circle orientation predicate is to use some 
ideas from plane geometry [5]. Define the following: 


1. A,, the radical axis of circles 0 and ij, is the line given by Eq. 6. This line is 
perpendicular to the line through (hy, ky) and (h,, k;). Orient A; so that 
(ho, ko) lies to its right. 

2. X; and X;", the two points at which circle i crosses circle 0, lie on A,. 
Label the points so that X; precedes X;" along the oriented A,. 

3. C;,,, the radical center of circles 0, i, and j, is the intersection of A; and A,. 

We note that C;; also lies on the radical axis of circles i and J. 


4. P. is the perpendicular projection of (hj, k,)) onto A,. 


L 


The radical axes A, form an arrangement of oriented lines in two-dimensional 
space, which we can compute explicitly. To compute the three-circle orientation, 
however, we need to know how the six crossing points X; and X; lie on the 
three radical axes. The following observations allow us to determine the positions 
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Figure 1. An example of observations (1) and (2) about radical centers and the orientation of crossings. 
The complete circle is circle 0; only arcs of the other circles are shown. 


of the crossings relative to the radical centers without computing their coordinates 
explicitly: 


1. If C;, lies inside circle 0, then C;, lies between X; and X; on A,. (For an 
example, take i = 1 and j = 3 in Figure 1.) 

2. If C,, and C;, both lie outside circle 0, and if P, lies between C;, and C,, on 
A, then x; and x;" lie between C;, and C,, on A,. (For an example, take 
i= 1, j= 2, and k = 3 in Figure 1.) 

3. If C;; and C;, both lie outside circle 0, and if P; lies to one side of both C;, 
and C,,, then X; and X;° both lie to that same side of C,,; and C,, on A,. 


Applying these observations allows us to determine which of a finite number of 
possible arrangements of crossing points X,* and oriented lines A, we have. In 
other words, even if we erased all of the arcs in Figure 1, we could still determine 
the parts of the dashed lines on which each of the X;* lie, from which we could 
deduce the order in which the six X,+ appear around circle 0, all without 
computing any of the coordinates of X,*. 

To accomplish this feat, we need only construct six topological arcs such that 
the following three properties hold: (1) Each arc begins at one crossing point and 
ends at another. (2) The six arcs together form a single, simple cycle. (3) No arc 
crosses any of the lines A,. If the boundary of a region in the arrangement 
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contains only two crossing points, like many of the infinite regions in Figure 1, the 
choice of arc is forced by requirements (1) and (3). If a region has more than two 
crossing points on its boundary, like the triangle in Figure 1, the choice of arcs is 
determined by requirements (1) and (2). In either case, the critical observation is 
that the choice of arc depends only on topological relationships, not on the actual 
coordinates of X;*. 

To implement this version of the three-circle predicate, we need to compute 
quantities like 


1 C,= (D;;/Wi;, qij/Wij)» where 


—c; 5b; aA; —C; i 5; (7) 
Pip FT) » Qi; = is Wi = ) 
, Cj b; ; Gj Cj , a; b; 
\ 
assuming the definitions of a,, b,, and c; given by Eqs. 3-5; 
2. whether the radical center C;, is inside circle 0: 

. 2 2 2,,,2). 

sign((p; a hyw;;) + (4,; — kow;;) _ i? w2) (8) 


3. the order of C;,, C;,, and P;, along A,; we test this by testing the signs of C;, 
and C;, in the following expression, whose zero-locus is the line perpendic- 
ular to A; through the center of circle 0: 


CY — Ko) (DiWjix — PiWij) — (% — No) (GijWjx — FjKWij)- (9) 


The quantities in Eq. 7 are two to three times the bitlengths of the inputs h,, k,, 
and r,;, while Eqs. 8 and 9 are about six times the bitlengths of the inputs, the latter 
after some more algebraic simplification. 

While this second version of the computation is conceptually more involved, it 
realizes a substantial reduction in the required bitlength for multiprecision arith- 
metic operations (from 20 to 6). The savings is considerably greater if the 
coefficients in the original circle equations are allowed to be rational instead of 
being restricted to integers. The extensive and intricate geometric reasoning 
required does, however, call into question the practicality of basing reliable 
arithmetic exclusively on integer computation. 

Another possible foundation for exact computation would be a system for 
representing algebraic numbers. Here, a number x is represented as a polynomial 
p and an open interval (a,b) such that x € (a,b), p(x) = 0, and p(y) # 0 for 
y € (a,b), y # x [6]. In such a system, we could use the rational parameterization 
of the circle to compute the parameters of the intersection points as the solutions 
to quartic equations; sorting these parameters would tell the three-circle orienta- 
tion. Efforts to make it practical to compute using polynomials and root-isolating 
intervals are underway [7]. 

With a sufficiently powerful symbolic-algebra system, one could even write the 
three-circle orientation test by solving for the coordinates of the intersection 
points, then sorting some arctangents. Such a simple real formulation of the 
solution is undeniably appealing, but it is likely to be even more expensive than 
using algebraic numbers. This dilemma accounts for the title of this column. Not 
only are the real numbers missing from computer arithmetic: sometimes we find 
ourselves really missing them. 
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Applied Mathematics 


On page 920 of the November issue of the Monthly, you quote 
George Steiner saying that ‘“‘applied mathematics is a higher plumb- 
ing, a kind of music for the police band.” Poincaré, who observed that 
“nature not only poses for us problems, she suggests the method of 
solution,” would have disagreed violently; so would have Newton, 
Euler, Lagrange, Riemann, Hilbert, Hadamard, Wiener, Weyl, 
Birkhoff, von Neumann, Kac and Friedrichs. The only mathematician 
that comes to mind who expressed such views is G.H. Hardy; “A 
Mathematician’s Apology” is marred by just such exaltation of the 
uselessness of mathematics. The great chemist Soddy put him down 


harshly but justly: “From such cloistral clowning the world sickens.” 

November must be snobbery month, for on page 901 you quote 
Alfred Adler: ‘‘Each generation has its few great mathematicians, and 
mathematics would not even notice the absence of the others.” 
Evidently Adler is not familiar with the principle of the integral 
calculus. Let’s hope that in the meanwhile Adler has managed to 
overcome his inferiority complex. 


Peter D. Lax 

Courant Insitute of Mathematical Sciences 
New York University 

251 Mercer Street 

New York, NY 10012-1185 
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THE EVOLUTION OF... 
Edited by Abe Shenitzer 
Mathematics, York University, North York, Ontario M3J 1P3, Canada 


The Evolution of Algebra 1800-—1870* 


I. G. Bashmakova and A. N. Rudakov 


The first event of this period was the appearance, in 1801, of C. F. Gauss’ 
Disquisitiones Arithmeticae. Of the seven parts of the book only one is devoted to 
an algebraic issue, namely the cyclotomic equation x” — 1 = 0. But the author’s 
brilliant algebraic thinking is apparent in all the other parts as well. Disquisitiones, 
an epoch-making work in algebraic number theory, was for a long time a handbook 
and source of ideas in algebra. In the course of his study of the cyclotomic 
equation Gauss shows that it is solvable for every n in the sense that the solutions 
are expressible in terms of radicals, gives a method for explicitly finding these 
expressions, and singles out the values of n for which the solutions are expressible 
in quadratic radicals and thus the values of n for which it is possible to construct a 
regular n-gon by means of ruler and compass. As always, his investigations are 
strikingly profound and detailed. They were continued by N. H. Abel, who proved 
the insolvability by radicals of the general quintic and singled out a class of 
equations, now named for him, that are solvable by radicals. The new notions of 
field (domain of rationality) and group (group of an equation) turned up in Abel’s 
papers with greater definiteness. The next step in this direction that completed the 
theory was the papers of the young E. Galois, published in fragmentary form 
between 1830 and 1832, and, after his death, in more complete form by Liouville in 
1846. 

The papers of Abel, and especially of Galois, already belong to the radically 
new trend of ideas now generally accepted in algebra. In his study of the ancient 
problem of solution of equations by radicals Galois shifted the center of gravity 
from the problem to the methods of its solution: he gave clear-cut definitions of 
the concepts of a field and of the group of an equation, established the correspon- 
dence between the subgroups of the group of an equation and the subfields of the 
splitting field of the polynomial on the left side of that equation, and, finally, 
singled out the normal subgroups of a group and studied its composition series. 
These were completely new and extremely fruitful methods of investigation and yet 


*This article is a reprint of the major part of the introduction to an essay dealing with the evolution 
of algebra and algebraic number theory during the period of 1800-1870. The essay forms Chapter 2 of 
the book Mathematics of the 19th Century that deals with mathematical logic, algebra, number theory, 
and probability theory in the 19th century. Chapter 2 was written by I. G. Bashmakova and A. N. 
Rudakov with the assistance of A. N. Parshin and E. I. Slavutin. The book was published in 1992 by 
Birkhaeuser Verlag and is a translation of a Russian book published by Nauka in 1978. (The reprinted 
material is found on pp. 36—40 of the Birkhauser Verlag book.) Reprinted with permission. 
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they were apprehended by mathematicians only in the 70s. The one exception was 
groups of substitutions. Such groups were considered by Galois and their investiga- 
tion began already in the 40s. 

Another source of group theory was Gauss’ theory of composition of classes of 
forms. In this theory one applied an operation analogous to addition (or multipli- 
cation) of numbers to objects very different from numbers. Gauss’ study of forms 
of the same discriminant was in effect a study of the fundamental properties of 
cyclic and general abelian groups. 

The two parts of Gauss’ remarkable paper ‘“‘The theory of biquadratic residues” 
appeared in 1828 and 1832, respectively. In it Gauss not only gave a geometric 
interpretation of the complex numbers (this was done before him) but also—and 
this is very important—transferred to complex numbers the notion of a whole 
number, a concept that seemed inseparable from the rational integers for more 
than 2000 years. 

Gauss constructed an arithmetic of complex integers entirely analogous to the 
usual arithmetic and used the new numbers to formulate the law of biquadratic 
reciprocity. This opened for arithmetic boundless new horizons. Soon Eisenstein 
and Jacobi formulated and proved the law of cubic reciprocity and used for this 
purpose numbers of the form K + mp, p? = 1, p # 1, and in 1846 P. Lejeune- 
Dirichlet found all units (that is invertible elements) of the ring of integers of the 
field Q(6), where @ is a root of 


x"+a,x""'4+ +--+ +a, =0, 


a,€ Z.' This paper, with its deep results in the theory of algebraic numbers, is also 
of interest from the point of view of group theory: in it Dirichlet constructed the 
first nontrivial example of an infinite abelian group and investigated its structure. 

Further progress in algebraic number theory was linked to reciprocity laws and 
to Fermat’s last theorem. Attempts to prove this theorem brought E. Kummer to 
the study of the arithmetic of fields Q(¢), ¢? = 1, ¢ # 1. In 1844-1847 Kummer 
discovered that if one defines a “prime” number to be an indecomposable integer 
in a field Q(¢), then the law of unique factorization into prime factors fails for the 
integers in Q(£). To ‘“‘save the day” and restore the possibility of constructing an 
arithmetic analogous to the usual (arithmetic) he introduced ideal factors. In so 
doing, he laid the foundations for the subtlest and most abstract theories of 
algebraic number theory. Kummer’s methods were local. They were further devel- 
oped by E. I. Zolotarev, K. Hensel, and others, and now form the core of 
commutative algebra. 

Linear algebra continued to develop in the first half of the 19th century. In this 
connection, the first thing to be noted is that whereas no part of Gauss’ Disquisi- 
tiones deals directly with linear algebra, its advance was bound to be furthered by 
the detailed study of integral quadratic forms in two variables contained in that 
work. A. Cauchy’s “On an equation for the determination of the secular inequali- 
ties of planetary motions” (1826) dealt implicitly with the eigenvalues of matrices 
of arbitrary order. Somewhat later, in 1834, there appeared C. G. J. Jacobi’s “On 
the transformation of two arbitrary homogeneous functions of the second order by 
means of linear substitutions into two others containing only squares of the 
variables; together with many theorems on the transformation of multiple inte- 
grals’” in which he explicitly studied quadratic forms and their reduction to 


'7 is the ring of integers and Q is the field of rational numbers. 
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canonical form. Jacobi also perfected the theory of determinants (1841). What was 
still lacking in this theory was geometric features and, above all, the all-important 
and fundamental notion of a linear space. The first, none-too-clear definition of a 
linear space was given by H. Grassmann in his Die lineale Ausdehnungslehre of 
1844. This work, rich in new ideas but written in a muddled manner, first attracted 
attention when its author published a reworked and improved version in 1862. In 
particular, the work contains a construction of exterior products and the now 
famous Grassmann algebra. In 1843 there appeared A. Cayley’s Chapters in the 
analytical geometry of (n) dimensions, a work less rich in ideas but better known to 
contemporary mathematicians. There is a close connection between the develop- 
ment of linear algebra and the theory of hypercomplex numbers (now known as the 
theory of algebras) which elicited considerable interest at the time. Years of 
fruitless attempts to generalize the complex numbers were crowned with success in 
1843 by W. R. Hamilton’s discovery of the quaternions. Hamilton studied the 
quaternions for over 20 years, for the rest of his life. His researches are summa- 
rized in two fundamental works: Lectures on quaternions (1853) and Elements of 
the theory of quaternions (1866). Their subsequent significance is due not so much 
to quaternions but to the new notions and methods of “vector calculus” intro- 
duced in this connection. 

To resume our account of the further development of group theory we mention 
the series of A. Cauchy’s papers, published between 1844-1846, in which he proves 
a great variety of theorems on groups of substitutions (subgroups of the symmetric 
group), including the famous theorem of Cauchy to the effect that a group whose 
order is divisible by a prime p contains an element of order p. A further major 
event in the history of group theory was the publication—in three parts (1854, 
1854, 1859)—of Cayley’s paper On the theory of groups, as depending on the 
symbolic equation 0" = 1. Following the spirit of the English school, Cayley views a 
group as an abstract set of symbols with a given law of composition and defines a 
number of fundamental notions of abstract group theory, chief among them being 
the notions of a group and of isomorphism. This was a notable step in the 
evolution of the new abstract mathematical thinking. 

Of crucial importance for the further development of group theory was the 
appearance, in 1870, of C. Jordan’s fundamental Traité des substitutions et des 
équations algébriques. This work contained the first systematic and complete 
exposition of Galois theory as well as a detailed presentation of results in group 
theory up to that time, including Jordan’s own significant results in these areas. In 
it Jordan also introduced what is now known as the Jordan canonical form of 
matrices of linear transformations. The publication of Jordan’s work was a major 
event in all of mathematics. 

Mention must be made of the flourishing, in the middle of the 19th century, of 
an area of algebra intermediate between linear algebra and algebraic geometry 
known as the theory of invariants. On the one hand, its content consists in the 
generalization and development of topics in linear algebra such as reduction to 
canonical form of quadratic forms and matrices of linear transformations. On the 
other hand, it is the study, in concrete situations, of the answer to the following 
question: “Given a geometric object determined in some coordinate system by 
certain algebraic conditions, find a way of obtaining from the algebraic conditions 
geometric characteristics of the object that are invariant with respect to coordinate 
transformations.” Between 1840-1870 many of the works of various mathemati- 
cians dealt with the determination of systems of invariants in different concrete 
situations. The best known are the works of Cayley, Eisenstein, Sylvester, Salmon 
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and Clebsch. In this connection one must single out two papers by Hesse, 
published in 1844 and 1851, respectively, in which he introduced the notion of a 
hessian and applied it to geometry, and P. Gordan’s famous 1868 paper in which 
he proved a general algebraic theorem on the existence of a finite system of base 
invariants. An important paper close to these investigations is Cayley’s A sixth 
memoir upon quantics (1859). In it Cayley showed how to consider the metric 
properties of geometric figures from the single viewpoint of the theory of invari- 
ants. This paper was one of the sources of F. Klein’s Erlangen Program that 
resulted in revolutionary changes in geometry. 

At that time, an important achievement in linear algebra was Sylvester’s 1852 
proof of the law of inertia of quadratic forms, presented in the paper Proof of the 
theorem that every homogeneous quadratic polynomial can be reduced by means of a 
real orthogonal substitution to the form of a sum of positive and negative squares. It 
was proved, but not published, somewhat earlier by Jacobi. In 1858 there appeared 
Cayley’s Memoir on the theory of matrices. In it Cayley introduced the algebra of 
square matrices and established the isomorphism between the algebra of quater- 
nions and a certain algebra of second-order matrices (a subalgebra of the algebra 
of all square second-order complex matrices). This work was of great importance 
for the clarification of the relation between the theory of algebras and linear 
algebra. 

In the sixties, the activities of K. Weierstrass had an important influence on the 
development of mathematics. He published virtually nothing but included the 
results of his investigations in his lectures at Berlin University. In his 1861 lectures 
Weierstrass introduced the notion of a direct sum of algebras and showed that 
every (finite dimensional) commutative algebra (over the field of real numbers) 
without nilpotent elements is the direct sum of copies of the fields of real and 
complex numbers. This was one of the earliest classification results in algebra. 

One of the main problems of algebraic number theory in the sixties and 
seventies was the extension of Kummer’s divisibility theory from cyclotomic fields 
to general algebraic number fields. This was accomplished in three different 
constructions due, respectively, to E. I. Zolotarev, R. Dedekind, and L. Kronecker. 
Of the three, it was Dedekind’s work—the Xth Supplement to Dirichlet’s lectures 
on number theory published in 1871 and the XIth Supplement to subsequent 
editions—that was accepted by all mathematicians as the solution of the problem. 
Dedekind’s clear, algebraically transparent, account became the model of mathe- 
matical style for many decades to come. By this and other works Dedekind laid the 
foundations of the contemporary axiomatic presentation of mathematical theories. 

In our survey of the evolution of algebra we have not touched on the theory of 
elliptic and abelian functions—one of the central lines of development of 19th- 
century mathematics, an area in which Gauss, Abel, Jacobi, Clebsch, Gordan, 
Weierstrass and many others invested great efforts. In the 19th century this area 
belonged primarily to analysis, more specifically to the theory of functions of a 
complex variable, and it was only gradually, especially at the end of the 19th 
century, that the role of algebraic ideas in it became very significant. 

The algebraization of the area began with Dedekind’s transfer of his theory, in a 
joint work with H. Weber (1882), to the field of algebraic functions. This estab- 
lished the deep parallelism between the theories of algebraic numbers and alge- 
braic functions and was the decisive step for an abstract definition of the concepts 
of field, module, ring, and ideal. From the end of the last century ideas began to 
flow in the opposite direction, from the theory of algebraic functions to number 
theory. This resulted in the introduction of p-adic numbers and topology by means 
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of p-adic metrics. But this is already part of the mathematics of the present 
century. 

The evolution of the ideas, methods, and theories just described resulted in the 
creation of abstract “modern algebra” and, later, of algebraic geometry whose 
flourishing we witness today. 


Mathematics 


Monthly readers may be interested in the following paragraphs from Smilla’s Sense of Snow by 
Peter Hgeg. (This book was published in the United States in 1993 by Farrar, Straus, and 
Giroux, a translation from Danish, translated by Tiina, originally published in Copenhagen in 
1992.) The narrrator, Smilla, is a young woman who loves mathematics and refers many times to 
mathematics and mathematicians—including Cantor, Dedekind, Fermat, Newton, and Euclid’s 
Elements. On pages 112—113, we find the following: 


It seems necessary to explain my claustrophobia to him. 

“Do you know what the foundation of mathematics is?” I ask. “The foundation of 
mathematics is numbers. If anyone asked me what makes me truly happy, I would say: 
numbers. Snow and ice and numbers. And do you know why?” 

He splits the claws with a nutcracker and pulls out the meat with curved tweezers. 

‘“‘Because the number system is like human life. First you have the natural numbers. 
The ones that are whole and positive. The numbers of a small child. But human 
consciousness expands. The child discovers a sense of longing, and do you know what the 
mathematical expression is for longing?” 

He adds cream and several drops of orange juice to the soup. 

“The negative numbers. The formation of the feeling that you are missing something. 
And human consciousness expands and grows even more, and the child discovers the in 
between spaces. Between stones, between pieces of moss on the stones, between people. 
And between numbers. And do you know what that leads to? It leads to fractions. Whole 
numbers plus fractions produce rational numbers. And human consciousness doesn’t stop 
there. It wants to go beyond reason. It adds an operation as absurd as the extraction of 
roots. And produces irrational numbers.” 


He warms French bread in the oven and fills the pepper mill. 

“It’s a form of madness. Because the irrational numbers are infinite. They can’t be 
written down. They force human consciousness out beyond the limits. And by adding 
irrational number to rational numbers, you get real numbers.” 

I’ve stepped into the middle of the room to have more space. It’s rare that you have a 
chance to explain yourself to a fellow human being. Usually you have to fight for the 
floor. And this is important to me. 

“It doesn’t stop. It never stops. Because now, on the spot, we expand the real numbers 
with imaginary square roots of negative numbers. These are numbers we can’t picture, 
numbers that normal human consciousness cannot comprehend. And when we add the 
imaginary numbers to the real numbers, we have the complex number system. The first 
number system in which it’s possible to explain satisfactorily the crystal formation of ice. 
It’s like a vast, open landscape. The horizons. You head toward them and they keep 
receding. That is Greenland, and that’s what I can’t be without. That’s why I don’t want 
to be locked up.” 

Because Smilla Jaspersen knows about snow and ice, she is suspicious about the cause of 
death of her young friend, Isaiah. Smilla’s Sense of Snow is the story of her investigation and its 
impact on Smilla, herself. A well-written book; I enjoyed it a lot. 


JoAnne §. Growney 

Department of Mathematics and Computer Science 
Bloomsburg University 

Bloomsburg, PA 17815 
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PROBLEMS AND SOLUTIONS 


Edited by: 
Richard T. Bumby, Fred Kochman and Douglas B. West 


Proposed problems should be sent to the MONTHLY PROBLEMS address given on 
the inside front cover. Please include solutions, relevant references, etc. Three copics 
are requested, 


Solutions of published problems should arrive before August 31, 1995 at the MONTH- 
LY PROBLEMS address given on the inside front cover. Solutions should be typed 
with double spacing, including the problem number and the solver’s name and mailing 
address. Two copies suffice. A self-addressed postcard or label should be included if 
an acknowledgement is desired. 


An asterisk (*) after the number of a problem, or part of a probicm, indicatcs that no 
solution is currently available. Partial solutions will be useful in such cases, Otherwise, 
the published solution is likely to be based on a solution which is complete and 
correct. Of course, an elcgant partial solution or a method leading to a more general 
result is always useful and welcome, In addition, references to other appearances 
of MONTHLY problems or to solutions of these problems in the literature arc also 
solicited. 


PROBLEMS 


10438. Proposed by Hunter S. Snevily, University of Idaho, Moscow, ID. 

Let S = nj,n2,...,n be a sequence of positive integers with sum n. Suppose that 
n < 2k. Show that, for all g with 1 < g <n, there is a subsequence of S with sum gq. 
10439. Proposed by Charles Vanden Eynden, Illinois State University, Normal, IL. 


The rational number 1/9 is an example of a number c in [0, 1] such that the decimal 
representation of neither c nor ./c contains the digit 0. Find an irrational number with the 
same property. 


10440. Proposed by Marius Cavachi, Constanta, Romania. 


Show that the Euclidean plane cannot be covered with circular disks having mutually 
disjoint interiors. 
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10441. Proposed by Emre Alkan (student), Bosphorus University, Istanbul, Turkey. 
Given k + 1 positive real numbers xo, ..., x4 and a positive integer n, show that 
Do Gor +++ +o)" Sa yar" 
oO 


where the sum on the left is taken over the k + 1 distinct k-element subsets of {xo, ... , xg}. 


10442. Proposed by Roger Bielawski, McMaster University, Hamilton, Ontario, Canada. 
Let f be acontinuous function from the unit disc D in R? to itself such that: 
f ° f is the identity of D; and 
f is the identity on the unit circle dD. 
Show that f is the identity on D. 


10443. Proposed by Ernesto Bruno Cossi, Universidade Federal do Rio Grande do Sul, Porto 
Alegre, Brazil. 


Let X be a topological space. Suppose that there is a mapping f from the underlying set 
of X to R such that lim,_,, f(x) = +00 (in the usual topology on R) for all limit points v 
of X. Prove that every compact subset of X is denumerable. 


10444. Proposed by Richard L. Bishop and Harold G. Diamond, University of Illinois, Urbana, 
IL. 


Let WY denote the class of measurable functions w on [0, 1] with O < w(x) < x for all 
x € [0, 1]. Let 
2 


1 1 
F() = | (vis) - | wed) dx 


M =sup{F(w): wv € W}. 


Find M and show that the extreme value is attained. 


and 


SOLUTIONS 


Counting Pairs of Permutations by Binomial Coefficients 


10233 [1992, 571]. Proposed by M.A. Khan, RDSO, Lucknow, India. 


For any odd positive integer n = 2r — 1, prove that 


yar () LCM (He-at= 5 


Solution by Richard Holzsager, The American University Washington, DC. When we reverse 
the order of summation and substitute - —t ( +) for 77 4 ( ), the new inner sum of terms 
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involving k is -7_)(—1)* (an (r — j)"*. Summing this by the binomial formula converts 
the original expression to 


1 r—1 
— Ce OIG ~jyt!-@-j-1"). 
j=0 


We write this as two sums, change j to j — 1 in the second sum and recombine, using the 
fact that CG" er — j)"+! = 0 when j = 0 and j =r, to obtain 


1 .{(n n sntt rol (n+1 ntl 
IL ((G)+G 2 ea = Lo ( j Je-a : 


The summand here is unchanged when i is replaced by n+1— j = 2r —j, and it is O when 
j =r. Thus the original sum equals EES with (- 1)J ("*) 


(r — j)"+!_ The summation 


in this expression is the (n + 1)st (backward) difference of the polynomial r”+!. Since the 
(n + 1)st difference of any polynomial in r of degree n + 1 is (n + 1)! times its leading 


coefficient, the original summation equals sat a =n! /2. 


Editorial comment. John Henry Steelman proved in a similar manner the more general 
result that for any nonnegative integers n andr withr <n-+ 1, 


1)- (n 
P(N) (Hew (e— arts Eecvi(t)a irae) =m 
k=0 


This reduces to the proposed identity when n = 2r — 1. 


Solved also by J. Anglesio (France), J. C. Binz (Switzerland), P. Bracken (Canada), R. J. Chapman (U. K.), 
W. Y. C. Chen, J. Fukuta (Japan), W. T. Gan (student, U. K.), H. van Haeringen (The Netherlands), M. E. H. Is- 
mail, N. Komanda, O. P. Lossers (The Netherlands), M. Mécsy (Hungary), K. Perera (student), C. R. Pranesachar 
(India), V. S. Ryko (Russia), E. Schmeichel, H.-J. Seiffert (Germany), J. H. Steelman, D. Zeilberger (as O. Khayyam & 
L. Euler), USA Mathematical Olympiad Program, and the University of Wyoming Problem Circle. 


Simultaneous Squares 


10238 [1992, 674]. Proposed by David M. Bloom, Brooklyn College of CUNY, Brooklyn, NY. 


(a) Show that there exist infinitely many positive integers a such that both a + 1 and 
3a + 1 are perfect squares. 

(b) Let aj < az < ... be the sequence of all solutions of (a). Show that a,ay,+41 + 1 is 
also a perfect square. 


Solution by Roman DrnovSek (student), Institute of Mathematics, Physics, and Mechanics, 
Ljubljana, Slovenia. If a + 1 = x* and3a +1 = y?, then y? — 3x2 = —2. The hypothesis 
modulo 4 implies that x and y must be odd, allowing the equation to be rewritten as (YP — 
3(44) = 1. Consider only the solutions in which x and y are positive integers. Then, 
clearly, x < y < 3x, sou = (3x — y)/2 and v = (y — x)/2 satisfying the Pell equation 
u* — 3v* = 1 are also positive integers. The smallest positive solution is (u1, v1) = (2, 1), 
and it is well known that the solutions {(un, v,)} satisfy un + u,p/V3 = (uy + vj V3)" = 
(2+ /3)" forn > 1. Witha = 24 V3 and B = 2 — V3, we have un = (a + B”)/2 
and v, = (a — B")/(2./3). We obtain infinitely many solutions to (a) by letting a, = 
x? —l=(u,+v,)? —1=(a™"t!-44+ p2rt!y 76, For (b), we compute a,a,4; +1 = 
[(x2"t2 — 8 4 B2"+2) 16]. Since the x,’s are integers, the a,’s and dnan41 + 1 are also 
integers. 
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Editorial comment. One could also use recurrence relations for the quantities in the prob- 
lem. For example, a,42 = 14a,41—a@,+8, so the sequence begins 8, 120, 1680, 23408, .... 
Several readers observed that dydn41 + 1 = [(an + an41)/4 - 1]”. Since part (a) follows 
from the reduction of the given condition to a Pell equation, several solvers replaced 3 by an 
arbitrary non-square m in that part. The proposer notes that the problem arose from a paper 
submitted to the New York City High School Math Fair (March, 1991), by Ms. Marianna 
Mayslich of James Madison High School, concerning sets S of positive integers such that, 
for some fixed integer t, ab + t is a square whenever a, b are distinct elements of S. This 
problem shows that {1, 3, an, @n41} has that property for t = 1. Thus, there are infinitely 
many four-element sets with this property. The challenge is to find a larger set. A related 
question is whether a,,a, + 1 can be a square when |m — n| > 1. 


Solved by 80 other readers and the proposer, with one incorrect solution and two incomplete solutions submitted. 
A Voter’s Paradox with Majority Rule 


10252 [1992, 782]. Proposed by James S. Weber, The University of Illinois, Chicago, IL. 


An election is to be held with V voters who will rank A alternatives. It is said that 
alternative X is an “M-majority preference” over alternative Y if there are at least M 
voters who prefer X to Y. A “voter’s paradox cycle” is an ordering of the alternatives 
ag, 41,...,4A—-1,44 = ao So that a; is preferred over a;4, for 0 <i < A. Prove thata 
voter’s paradox cycle can exist for M-majority preference if and only if AM < V(A—1). 


Solution by Robert High, New York, NY. First suppose such a cycle does exist and AM > 
V(A — 1). For each a;, the set S$; of voters with a; not preferred to aj, contains strictly 
fewer than V — M members. Since M > V(A — 1)/A, S; contains strictly less than V/A 
members. The union of all the $; thus contains fewer than V members. This says that the 
intersection of the complements of of all the S; must be non-empty. But this intersection 
consists of those voters who prefer a; to aj+, for all i, a contradiction. 

Next, assume that AM < V(A— 1). Since M is an integer, we must have M < 
| V(A — 1)/A ]. Note that V = [V/A ]+[V(A —1)/A J. We will construct a set of voter 
preferences (total orders of A) yielding a cycle. Let ag, a1, ...,@4—1 be some ordering of 
the alternatives, and let a4 = ao. Also fix an ordering of voters. Let P = [ V/A ]. For the 
first P voters, specify that a9 < a ,; then specify for the next P voters that a; < az; and 
continue in this way until we run out of voters. Otherwise, let a; > aj41 for each voter. 

Since P > V/A, we will exhaust the voters before we run out of alternatives. This means 
that, for each voter, there will be one a; that is not preferred to aj41 while a; is preferred 
to aj+4, in all other cases. Thus, the order defined for each voter is consistent. Indeed, it a 
total order in which each alternative is preferred to the next, starting from aj1;. But, for 
each a;, we have at least | V(A — 1)/A | = M voters preferring a; to aj+1. This completes 
the construction of the voter paradox cycle. 


Editorial comment. This result first appeared in the literature of social choice theory in 
J. Greenberg, “Consistent majority rules over compact sets of alternatives”, Econometrica 
47 (1979), 627-636. The proposer’s solution can be found in J. S. Weber, “An elementary 
proof of the conditions for a generalized Condorcet paradox”, Public Choice 77 (1993), 
415-419, 

On January 8, 1993, Robert High died in a rafting accident in Chile. This solution, 
his last contribution to this Problem Section, was received on January 2, 1993. His other 
interests included the game of Go, and he had just become President of the American Go 
Association at that time. A memorial article appears in the Winter 1993 issue of the Journal 
of that organization. He was a devoted follower of this Problem Section; the solution printed 
above is a reminder of how much he will be missed. 
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Solved also by D. Callan, R. J. Chapman (U. K.), K. S. Kedlaya (student), O. P. Lossers (The Netherlands), R. Mar- 
tin (student), R. Powers, K. Rebman, K. B. Reid, E. Schmeichel, Western Maryland College Problems group, University 
of Wyoming Problem Circle, and the proposer. One incomplete solution was received. 


Complex Roots of Special Quartic Polynomials 


10253 [1992, 782]. Proposed by W. Weston Meyer, General Motors Research Laboratories, 
Warren, MI. 


Show that the quartic equation 
z* — 22? + 2cz-—1=0, 
where c is a complex number with complex conjugate c, has a root not on the unit circle 
{z : |z| = 1} if and only if (Stc)!/3 + (3c)!/7i lies outside this circle. 


Solution I by W. O. Egerland and C. E. Hansen, ARL, Aberdeen Proving Ground, MD. We 
show that the roots of the given quartic equation lie on the unit circle if and only if c lies 
inside or on the astroid x +iy = cos? t + i sin? t,O<t < 27. A result of A. Cohn, 
specialized to the polynomial p(z) = z* — 2cz* + 2cz — 1, states that the zeros of p(z) 
lie on the unit circle and are simple if and only if the zeros of p’(z) = 2(2z? — 3cz* + c) 
lie in |z| < 1 or, equivalently, if the zeros of the polynomial g(z) = cz? — 3cz + 2 lie in 
|z| > 1 (see M. Marden, Geometry of Polynomials, American Mathematical Society, 1985, 
p. 206, Exercise 3). An application of Theorem 6.8b, on p. 493 of P. Henrici, Applied and 
Computational Complex Analysis, Vol. 1, Wiley, 1974, shows that this occurs if and only if 
c = c1 + icp satisfies |c| < 1 and (c? +c} — 1)? + 27c?c2 < 0, ie., if ¢ lies inside the 
astroid. If c lies on the astroid, then p(z) = (z — e!!)*(z? + (ie sin 2t)z — e72!*). This 
completes the proof. 


Solution IIT by Anchorage Math Solutions Group, University of Alaska, Anchorage, AK. As 

in Solution I, let p(z) = z4 — 2ez? + 2cz — 1. Then, e!® is a root of p(z) if and only if 
e719 ne!) — ei _ Ice? 4 Ice '? _ e210 — 0. (1) 
Write c = x + iy = |c|e!?. Then (1) becomes 2i sin20 — 4i |c| sin(@ + ¢) = 0, which 


simplifies to 
x y 
cos 0 T Sind =I. @) 
Thus, the roots of p(z) on the unit circle correspond to the values of 6 for which the line 
with intercepts (cos 0, 0) and (0, sin@) contains the point (x, y). These lines are precisely 
those for which the distance between intercepts 1s 1. 

Without loss of generality, we may assume that (x, y) lies in the first quadrant. An 
application of the intermediate value theorem gives a value of 6 in the second quadrant and 
one in the fourth quadrant with the required property. Any other solution would be in the 
first quadrant. 

Let D be the minimum distance between intercepts for lines through (x, y) meeting the 
positive halves of both axes. By calculus, this is found to be (x?/3 + y?/3)3/2, If D < 1, 
the intermediate value theorem gives two admissible values of 6 in the first quadrant. We 
have now found four roots of p(z) on the unit circle. If D > 1 the two solutions previously 
found are easily seen to be the only solutions. Thus p(z) has two roots on the unit circle 
and two roots off the circle. If D = 1, there is one new value of @. All roots lie on the unit 
circle in this case, though one of them is a double root. This has also been noted in Solution 
I. 


Solved also by J. Anglesio (France), F. Brulois, R. J. Chapman (U. K.), P. Deiermann, H. S. Gunaratne (Brunei), 
H. Kappus (Switzerland), K.-W. Lau (Hong Kong), O. P. Lossers (The Netherlands), T. L. McCoy, A. D. Melas (Greece), 
Y. Nievergelt, N. Passell, D. Tan, Western Maryland College Problems group, University of Wyoming Problem Circle, 
and the proposer. 


1995] PROBLEMS AND SOLUTIONS 277 


More Isogonal Configurations 


10293 [1993, 291]. Proposed by Moshe Rosenfeld, Pacific Lutheran University, Tacoma, WA. 


Suppose four distinct lines through the origin in R? have the property that the six acute 
angles between pairs of these lines are all equal. Prove that this configuration of four lines 
is isometric either to the diagonals of a cube or to a configuration of four of the six diagonals 
of a regular icosahedron. 


Solution I by Raphael M. Robinson, University of California, Berkeley, CA. Let the acute 
angle between pairs of lines be 6. The four lines will intersect the unit sphere in four pairs 
of opposite points. We may take one point at the north pole. There will then be three other 
points in the northern hemisphere at latitude 90° — 6. The spherical distance between each 
pair of points will be either 6 or 180° — 6. Let the differences in longitude of these three 
points be a1, a2, and a3, so that a) +a@2-+a3 = 360°. By the law of cosines for a spherical 
triangle, we have 


cos” 6 + sin? 6 cos a; = +cosé. 


With the plus or minus sign, this leads to 


cos @ —cos@ 
cosa; = ——— or cosa; = —————. 
‘ 1+ cos 0 ‘~ 1 —~cosé 
Two of the values must be equal, say cosa, = cos a2. Since a + a@2 < 360°, this implies 
that ~@; = a2, hence a3 = 360° — 2a. 
If «3 is also equal to a1, then each a; = 120°. Only the second equation relating cos a; 
and cos @ is satisfied, and we see that cos@ = 1/3. The lines form the diagonals of a cube. 
If a3 is not equal to a, then we must have 
cos 0 —cos@ 


cosa; = ——— and cosa3 = ———— 
1+ cos@ 1 —cos@ 


or vice versa. Since cos @3 = Cos 2a@1, this leads in the first case to 
— cos 0 9 cos @ 2 \ 
1—cos@ \1+cosé 
which reduces to 5cos*@ = 1, or cos@ = 1/+/5. The second case leads to the the same 
conclusion. In the first case, we see that cosa, = (/5 —] ) / 4, hence a; = 72°, so that 
Q@ = 72° anda3 = 216°. Inthe second case, we find that a3 = 72°, hencea, = a2 = 144°. 
Both cases produce four of the six diagonals of a regular icosahedron. The two figures are 


congruent, since any pair of omitted diagonals can be taken into any other pair. 
Thus there are exactly two configurations that satisfy the stated conditions. 


Solution II by A. N. ’t Woord, University of Technology, Eindhoven, The Netherlands. Let 
V1, U2, V3, v4 be unit vectors in R? Corresponding with the directions of the four lines. Let 
n 
A be the 3 x 4-matrix (vj v2v304). Define G = A'A = (vi v)) -_ The matrix G has 
i,j= 
1 on the diagonal and has +a outside the diagonal, where a = cos(@) and ¢ is the angle 


between any two of the four lines. After a permutation of v;,..., vq and multiplying some 
of the vectors v1, ..., v4 by —1 we may assume that G is one of the following matrices: 
l1 aaa 1 a a a 1 @ a a 
a laa a 1 --a@ a@ a 1 —-a@ -a@ 
aatilaty’ta -a 1 a]’ta -a 1 -a 
aaa tl a @ a 1 a —-a -a 1 
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Observe that the rank of G is not greater than the rank of A which is at most 3. It follows 
that det(G) = 0. 


case 1: 
l1aaeea 
Gal” 1 aa 
~Ttaaiila 
aaa iti 
Now we get 0 = det(G) = —(a — 1)3(3a@ 4+ 1), which leads to a contradiction because 
O<a<l. 
case 2: 
1 @ a a 
G= a il --a@ @ 
a —-a 1 a 
a a a 1 


Here we get 0 = det(G) = (a — 1)(a + 1)(S5a* — 1), soa = 1/5 and @ © 63.4349°. 
case 3: 


1 a a a 
G= a 1 —a —-a@ 
a -—-a 1 —a 


a —a -—-a |] 


Now we get 0 = det(G) = —(a — 1)(a + 1)?, sow = 1/3 and d © 70.5288°. 

The matrix G determines A up to a left-multiplication of an orthogonal matrix. The 
second case corresponds with four of the six diagonals of an icosahedron. The third case 
corresponds with the diagonals of a cube. 


Editorial comment. With the goal of producing a “proof without words”, Mario Barra 
submitted some drawings illustrating the addition of a fourth line to a configuration of three 
lines. The effort is appreciated; however, the words of the selected solutions seemed more 
Convincing. 

The title is borrowed from the article, Timothy Murdoch, “Isogonal configurations”, this 
MONTHLY, April 1993. 


Solved also by M. Barra (Italy), V. Bozin (student, Yugoslavia), R. J. Chapman (U. K.), I. Kastanas, O. P. Lossers (The 
Netherlands), A. D. Melas (Greece), H. Morris, A. Pedersen (Denmark), F, Schmidt, R. Stong, M. Vowe (Switzerland), 
Anchorage Math Solutions Group, and the proposer. 


Sums of C-polynomials 


10297 [1993, 291]. Proposed by Zalman Rubinstein, University of Haifa, Haifa, Israel. 


Let p(x) be a polynomial of degree n. 

(a) Show that p(x) can be written as a sum of four polynomials go(x), q1(x), g2(x), 
q3(x), each of degree at most n with all roots of all g; (x) lying on the unit circle {x : |z| = 1}. 

(b)* Is there a polynomial p(x) which can not be expressed as a sum of fewer than 4 
such q; (x)? 


Solution by Richard Stong, Rice University, Houston, TX. We show that 3 such q;(z) suf- 
fice for any p(z). First note that f(z) = (21) "(z — 1)" g(i(z + 1)/(z- 1)) is a poly- 
nomial of degree exactly n with all of its roots on {z : |z| = 1} if and only if g(z) = 
(z—i)"f ( (z+ i)/(2-1) ) is a polynomial of degree exactly n with all real roots. There- 
fore it suffices to express any polynomial of degree at most n as a sum of three polynomials 
of degree exactly n with all real roots. 
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Let p(z) = f(z) tig(z) where f and g are polynomials with real coefficients. Choose 
a constant C with | f(x)| < C and |g(x)| < C for x € [—1,n]. Let 


go(z) = A+ i) Kz(z-1)...(2-—n+1) 
for some large K (K > 4C suffices); let 
qi(z) = f(z) — Kz(z—1)...(2-—n+1) 


and 
qo(z) = i(g(z) — Kz(z—1)...@-n+])). 


Clearly p(z) = qo(z) + q1(z) + 42(z) and go(z) has only real roots. Furthermore, if K is 
large enough, then the polynomial Kz(z — 1)...(z —n +1) will exceed C in magnitude 
at the points z = —1/2,1/2,...,n — 1/2 and the values at these points will alternate in 
sion. Since f(z) and g(z) lie in [—C, C] for these z , the polynomials g;(z) and q2(z) are 
not identically zero and each must have a root in each of the intervals [k — 1/2, k + 1/2], 
0 <k <n-—1. Since they are polynomials of degree at most n, these must be all roots, and 
the polynomials have exact degree n. 


Editorial comment. The solvers listed below all solved part (a); both Richard Stong and 
Antonios D. Melas showed that three polynomials suffice. Thus part (b) has a negative 
answer, and it is natural to inquire whether two polynomials will suffice. In fact, this 
question was addressed in the solution of Melas. The idea of the construction is to find a 
polynomial G(z) of degree 4 that cannot be the sum of two polynomials with real roots, 
and apply the transformation used in the selected solution with n = 4 to obtain F(z). The 
inverse transformation takes a C-polynomial of degree 4 into a polynomial with only real 
roots multiplied by a power of z — i. This requires a study of G(z) at z = i. These 
considerations lead to choosing G(z) = z++4iz+1. The fact that G’(i) = 0 is used to rule 
out the possibility of z = i being a multiple root of one of the summands. If G(z) is a sum 
of two polynomials with only real roots, its zero coefficients will lead to zero coefficients 
of the terms of degree 2 and 3 in the summands. However, such polynomials cannot have 
only real roots. The possibility that one of the summands in the representation of G(z) has 
a simple root at z = i is handled similarly. A detailed proof would be too long to reproduce 
here. 


Solved also by I. Kastanas, O. P. Lossers (The Netherlands), F. Schmidt, and the proposer. 


Collaborating editors: David F. Appleyard, Paul T. Bateman, Bruce C. Berndt, Duane 
M. Broline, Barry W. Brunson, Frank S. Cater, Gulbank D. Chakerian, Underwood Dudley, 
Gerald A. Edgar, Michael A. Filaseta, Ira M. Gessel, Richard A. Gibbs, Jerrold R. Griggs, 
Douglas A. Hensley, John R. Isbell, Mourad E. H. Ismail, Murray Klamkin, Daniel J. Kleitman, 
Frederick W. Luttmann, Frank B. Miles, Richard Pfiefer, Stephen L. Portnoy, J. O. Shallit, John 
Henry Steelman, Kenneth B. Stolarsky, David E. Tepper, Douglas B. Tyler, Daniel Ullman, and 
William E. Watkins. 
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Gale’s Round-Trip Jeep Problem 


Alan Hausrath, Bradley Jackson, John Mitchem, 
Edward Schmeichel 


In 1947 Fine [Fin] introduced and solved a problem of maximizing the distance a 
jeep can travel into the desert using n drums of fuel. Subsequently, Phipps [Phil, 
Alway [Alw], and Gale [Gal] gave other solutions to the original problem or 
considered related problems. As mentioned in [Fin], the original problem is similar 
to one which arose in air transport operations in the China theater during World 
War II, and it has been suggested that there may be applications to Arctic 
expeditions and interplanetary travel. 

Near the end of [Gall], the author states, “An apparently simple question is the 
round trip problem in which fuel is available at both ends of the desert, but I must 
confess ...that I have not been able to find the solution. It is not hard to see that 
one can do at least as well in this case as in the case of two jeeps making one-way 
trips, but it may be possible to do better. The difficulty here as with many 
optimization problems is that there does not appear to be any simple way to 
determine whether or not a given solution is optimal.” 

Gale’s problem can be interpreted in two equivalent ways. (i) Given unlimited 
fuel at each end of a desert of given length, find a round trip across the desert 
which uses as little fuel as possible. (ii) Given a fixed amount of fuel which can be 
distributed between the two ends of a desert, find the maximum length desert 
which can be crossed in a round trip using the available fuel. We find it convenient 
to consider (ii) and give an optimal solution for it. We also describe a solution for 
the analogous round trip problem where the two allowed depots may be placed 
anywhere in the desert. 

In each of the above problems the jeep can carry exactly 1 drum. It is implicit 
that the jeep can store whatever fraction of a drum is desired at any point in the 
desert. (Perhaps the driver carries large plastic bags for fuel storage.) In [Dew], 
Dewdney proposed an interesting variation of the one-way problem. Although 
Dewdney’s problem was given in terms of drums, gallons, and miles, it can be 
rephrased as follows: Find the maximum distance a jeep can travel into the desert 
using m drums of fuel where the jeep can carry 1 drum plus 1/5 of a drum in its 
tank, but only drums can be stored. That is the jeep can dump at most 5/6 of its 
fuel capacity in the desert. It is interesting to note that Dewdney’s problem has 
been solved as a linear programming problem; an optimal algorithm for Dewdney’s 
problem appears in [Jac]. But the problems solved in this paper apparently are not 
easily posed as either linear or dynamic programming problems. In [Gal], Gale also 
points out that “there is a feeling among many people that the original jeep 
problem can be solved by the functional equation method of dynamic program- 
ming...I know of no way of solving the problem by this method.” 


1. A BRIEF DESCRIPTION OF THE SOLUTION TO GALE’S PROBLEM. In 
solving Gale’s problem we will start by considering the longest desert which can be 
crossed in a round trip if there are m drums of fuel at the start S and k drums of 
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fuel at the finish F. Let D(f) denote the length of the optimal one-way trip using 
f drums of fuel. If m <k, it is clear that one can do no better than D(m) and 
should use the S-fuel outbound and the F-fuel returning. For m > k, going D(m) 
outbound will not work as the jeep is unable to return to S. Instead, in order to 
make full use of the drums at S, on the outbound trip a number of depots are 
created leaving fuel for the return. Let J denote the location of the depot furthest 
from S. We prove that the following highly plausible qualitative conditions deter- 
mine an optimal solution: (i) Use only S-fuel when going from S to F. (i) Use only 
F-fuel when returning from F to T. Gii) Use only S-fuel stored at the depots, when 
returning from T to S. The solution then follows by putting together solutions of 
previously solved jeep problems. Thus it follows from (ii) that the distance from F 
to T is D(k), and the distance from T to S is obtained by solving a slight variation 
of the well-known round trip jeep problem with fuel only at S. 

To finish Gale’s problem, we need only find the optimal distribution of the 
available fuel between S and F. 


2. ORIGINAL PROBLEMS. We have x drums of fuel available at the edge of the 
desert and a jeep which can carry at most 1 drum. Here we give the well-known 
algorithm for maximizing the one-way distance, and an algorithm for maximizing 
the round trip distance for the jeep using x drums of which k must be delivered to 
F. One unit of distance will be the distance that the jeep can travel on one drum of 
fuel. We assume that the jeep’s efficiency is constant. It does not depend on wind, 
weather, weight, or depth of the ruts in the sand. The algorithms and their 
optimality proofs which we give are based on work appearing in [Gal], [Phi], [Fin], 
and [Niv, Section 10.9]. 


Theorem A. Given n + f, 0 <f <1, drums of fuel at the start and a jeep with 
capacity of 1 drum, the maximum one-way distance which the jeep can travel is 
1 1 f 
— eee +- +- . 
3 5 2n—1 2n+1 


Proof: We begin with an algorithm which achieves distance D,. First assume 
f = 0. Repeat n times: Put 1 drum of fuel into the jeep, drive forward 1/2n — 1 
units, store 1 — (2/2n — 1) units, and return to the previous fuel dump, except on 
the nth iteration do not return. We now have 


=n-—I1 


2 
(n—1)|1- +1— 
2n-1 2n—1 


drums of fuel, and the jeep at distance 1/(2n — 1) from the previous dump. Iterate 
this process, replacing n successively by n — 1,n — 2,...,1. 

If f > 0, begin the above process by first moving all m + 1 drums forward 
f/2n + 1 units, thus delivering n full drums to the first fuel dump. 

In order to show that D, is the maximum attainable distance when f = 0, for 
any integer i, 0 <i <n, we let x, denote the point on segment SF such that the 
total distance traveled on the right of x, is i. (We emphasize that the distance 
traveled is on the right, not ‘toward’ the right.) Then x, = S and x, = F. Since the 
jeep used exactly k units of fuel while traveling on the right of x,, at least k units 
must arrive at x,. Let P be any point between x,,, and X;,. 

Then more than k units of fuel must have crossed P, and thus the jeep crossed 
P while traveling toward the right at least A + 1 times. It follows that the jeep 
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crossed P at least k times while going toward the left. Hence the jeep crossed P 
at least 2k + 1 times and the distance between x,,, and x,, denoted (x,,,, x,) is 
at most 12k + 1). Thus we have that the distance from S to F is at most 
(Xo Xn—1) + (Xt Xn—2) +7 +44, Xo) 
1 1 


+- 
2n—1 2n — 3 


Similarly when f > 0, the distance from S$ to x, is at most fA2n + 1). This 
completes the proof of Theorem A. 


1 
< tr tyt1=D,. 


Theorem B. Let m,k be integers,0 <f, g <1, andm+g>k+f.Givenm+g 
drums of fuel at S and a jeep with capacity 1 drum, the maximum round trip distance 
in which the jeep delivers k + f drums to F is 


g—f 
; ifm=k; 
pa} 2m+2 ym 
2 = g 1 1 1 1-f at 
+——+ +ooee 4 + ———, >k+1. 
Im+2. 2m m2 kia’ 2kan? Y™ 


Proof: First consider the case when m > k + 1. For any integer i, m+g>i2> 
k + f, let x, be the point such that exactly i units of S-fuel are used to the right of 
that point or delivered to F. Let P be a point between x, and x;,_,, for 
m>i>k+2. Then P is crossed at least 2i times using S-fuel and 
dist(x,;, x;_,) < 1/2i. Also if P is between S and x,,, then P is crossed at least 
2m+2 times using S-fuel so that dist(S,x,,) <g/2m + 2). Similarly if P 
is between x,,, and F, then P is crossed at least 2k + 2 times using S-fuel 
and dist(x,,,,F) < (1 — f)A2k + 2). Thus the distance between S and F is at 
most D,. 

We now give an algorithm which uses m + g drums of fuel for a round trip of 
length D, and delivers k + f drums of that fuel to F. At S, repeat m + 1 times: 
Put (m + g)Am + 1) units into the jeep, drive forward gA2m + 2) units and 
leave all fuel except just enough to return to S. On the last trip do not return to S 
but leave all fuel at this dump. The total amount of fuel at the dump at this stage is 
(m+ g)— (2m + 1X gK2m + 2)) =m + (gh2m + 2)). Leave g(2m + 2) for the 
return journey. For each 1, m >1>k + 2, fill the jeep 7 times and each time go 
forward 1/2i units. On each of the first i — 1 times dump 1 — (1/i) units and 
return. The last time dump 1 — (1/2i) units. Now we use i — 1 units to go on to 
the next fuel dump, having left 1/2i for the return. Finally, use 1 — f units to 
make k + 1 round trips between x,,, and F. This delivers (k + 1)- 1 —-f)= 
k + f units to F at distance (1 — f)A2k + 2) from x,,,. This completes the proof 
when m >k + 1. The proof when m = k is analogous. 

We observe that when k + f = 0, Theorem B gives the well-known maximum 
length desert which can be crossed in a round trip using only fuel from S. 


3. ROUND TRIPS WITH 2 FUEL DEPOTS. We consider two different problems, 
one in which we have the depots at each end of the desert, and one in which we 
can put the second depot at any point. Of course the first depot must be at the 
start. In both problems we want to maximize the length of the desert which can be 
crossed for a fixed amount of fuel. 

We begin with a theorem which gives the maximum length desert which can be 
crossed given that some fixed amounts of fuel are available at each end of the 
desert. 
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Suppose that a total of x drums of fuel are available. One way to proceed is to 
divide the fuel equally between S and F. Then use the one-way algorithm for 
traveling from S$ to F and for returning to S from F. We show, however, that it is 
more efficient to allocate less than half the fuel to F. In fact the maximum 
distance is achieved when F receives only k = (x + 1) /2)*/ | drums of fuel 
and x — k drums are available at S. 

If x — k,(k,) drums of fuel are available at SCF), where k, < x/2, Theorem 1 
establishes the maximum desert length that can be crossed. The algorithm for 
attaining that distance establishes a point T between S and F. The distance from 
T to F is the maximum one-way distance which can be crossed using k, drums of 
fuel. The jeep will travel from S to T in the same manner as the round trip 
algorithm given in Theorem B. It will deliver k, drums of fuel to T, which will be 
used on a one-way trip from T to F, plus a small amount of extra fuel for the 
return trip to S. On the return trip the jeep uses the k, drums of fuel at F to 
reach T, and from T uses the fuel at each depot to reach the next depot eventually 
returning to S. These ideas are illustrated in Figures 1 and 2 for 53 drums at S 
and 24 drums at F. In Figure 1 at each J, the pair (7, s) denotes r units of fuel 
delivered to Y, for use to the right of Z, and s units delivered to J to be used 
on the return trip from S. In Figure 2 the number at each J, is the amount of fuel 
used on the return trip to the left of J. 


~ 1 
+ 33> 
¥ —_— | +> 
S Ra, DB, DB; D,=T Dd; DB, F 


rn 


2 
+ (53%) (40) (3:8) (23,4) (2,0) = (1,0) 25 


Figure 1. Outbound Trip from S to F 


| iN 


1 1 1 1 
+167 +107 + 3—> +—i3-> + |— 
S DR, 2 DR, QZ,=T Dz D, F 
16 16 a 8 1 2 23 


Figure 2. Return Trip from S to F 


Before stating and proving Theorem 1 we make a definition which for m = 5, 
g=4,k=2, f = = gives the distance shown in Figure 1 (as well as in Figure 2). 
For m,k integers,0 <f, g <1, andm+g2>k +f we define 


1 1 
fee fot]. (1) 


Am + ek I)= Pat sey eI 3 
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Theorem 1. Suppose there are m + g drums of fuel at S and k + f drums at F, 
where m is a positive integer, k is a non-negative integer, m+ g>=>k + f, and 0 <f, 
g <1. Then the maximum distance between S and F is d(m + g,k + f). 


Before proving the theorem we make some definitions and prove a lemma. A 
feasible solution to a jeep problem is any trip which obeys the rules of the problem. 
The value of the feasible solution is the distance from S to F, and an optimal 
solution is any feasible solution with maximum value. 


Lemma 1. For any feasible solution AS that entails S-fuel arriving by jeep at point 
F, there is a new feasible solution ASA’ at least as large in which no S-fuel arrives 
at F. 


Proof of Lemma 1. Suppose S-fuel arrives at F. Without loss of generality we may 
assume that when S-fuel first arrives at F, say at time t), there is F-fuel still there. 
After time t, each time the jeep leaves F let the percentage of S-fuel in the tank 
equal the percentage of S-fuel at F. Let M be the minimum amount of F-fuel 
which leaves F after time t). Let F’ = max{F — M, the return trip turnaround 
point closest to F, return trip fuel dump closest to F but different from F.} 

In the new solution A’ the jeep does exactly the same movement as before, 
but takes no S-fuel to F, instead S-fuel is left at F’. On the return trip S-fuel 
which was previously taken on at F is added at F’. This proves Lemma 1. 


Proof of Theorem 1. By Lemma 1 we may restrict our attention to feasible 
solutions in which no fuel arrives at F. We define the return trip as all travel after 
the jeep’s first arrival at F. Let T be the point in [S, F] farthest from F which can 
be reached from F using only F-fuel. By Theorem A, 
dist(7T, F 1 : : u 
t =1l+—-4+-°-4+ + . 
ist(T', F) 3 2k-1. 2k+1 
We use y to denote this distance. 
We give now another lemma. 


Lemma 2. Suppose there is a feasible solution AS in which S-fuel is used to the 
right of T on the return trip. Then there is a new feasible solution A+’ at least as 
large in which no S-fuel is used to the right of T on the return trip. 


Proof of Lemma 2. Change AH to F ~' as follows. On the outbound trip the 
travel is exactly the same as in Y/Y except any S-fuel stored in (T, F] for use on 
the return is instead stored at T. In the return trip of A’ the first travel is that 
required to cover the distance y from F and T. In traveling from T back to S the 
jeep travels as it did in AA taking on fuel at T for travel to the left. This is 
possible because in FY” we have at least as much fuel available at T as in FH 
and the same fuel between S and JT as in Y~. This completes the proof of 
Lemma 2. 

By Lemma 2 we may consider only feasible solutions in which dist(T, F) = y 
and k + f units of S-fuel must arrive at T for the outbound journey . Hence we 
only need to maximize dist(S, 7), given that k + f units of fuel must arrive at T 
for the outbound trip, and only m+ units are available at S. An optimal 
distance and algorithm are given in Theorem B. 

This completes the proof of Theorem 1. 
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Theorem 2 gives the maximum length desert which can be crossed using x 
drums of fuel divided between S and F. It shows that k = (x + 1) /2)” | 
drums should be placed at F, and thus the distance from T to F will be the 
maximum one way distance D,, as given in Theorem A, that can be traveled using 
k drums of fuel. 

Table 1 gives the number of drums at F for various values of x. 


TABLE 1 


, 


x (total number 
of drums) 


[71, 97) 


k (number of 
drums at F) 


In order to prove Theorem 2 it is convenient to first prove another lemma, 
which shows that we can assume there are an integer number of drums at F. 


Lemma 3. Let m and k be integers, 0 <f, g<1,m+g2>k+f,andm+g+t+ 
k+f>2. Then (i) dm + g,k+f)<d(m+f+2,k), or Vii) dm+g,k +f) 
<d(m+f+g-1,k +1). 


(Note: If m+ f+g—1<k +1, the right hand side of (ii) is undefined. But in 
that case, we will show that (i) holds.) 
Proof: Suppose first that f + g < 1. Then we have 


1 1 


——__ - _____ + —__]}, (2 
2m+2 2k+1 Ikaay 


d(m + f+ ek) —d(m +e,k +f) =f 


which is positive when m =k or k + 1. Now let m >k + 2. Then we also have 
d(m+f+g—-1,k+1)—d(m+e,k+f) 
1 1 
+- 
2m 2k +1 2k+2 


1 1 
+- —_— 
Slam 2m+2 


- (3) 


Since 


>) 


1 1 
2m 2m + 2 
the right side of either (2) or (3) will be nonnegative unless 
1 1 1 1 


< < , 
2m +2 2k +1 2k4+2 2m 


or equivalently 
1 1 1 
—_——— << ———_— < — . 
2m+2 (2k + 1)(2k + 2) 2m 


But this forces (2k + 12k + 2), which is an even integer, to be 2m + 1. This 
impossibility completes the proof when f+ g < 1. 
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Suppose next that 1 < /f +g < 2. Then we have 


1 1 1 


Um+fteky—dim+sk+fy=f> Ta oead + tee 


2n+2 2n+4 


+(1— 8) - (4) 


The right hand side of (4) is positive when m = k. Now let m => k + 1. Then we 
also have 

d(m+f+g—1,k+1)—d(m+g,k+f) 

1 1 
eee +- —_— 
2m+2 2k+1 2k+2 


= -(-f) (5) 


Now an argument precisely analogous to the one above shows that the right side 
of either (4) or (5) is nonnegative, which completes the proof of Lemma 3. 


Theorem 2. Given x > 2 drums of fuel divided between depots at each end of the 
desert, the maximum distance which can be crossed in a round trip is d(x — k,k) 
where k = |((x + 1) /2)”* |. Furthermore the algorithm for achieving this distance 
is given in the proofs of Theorems A and B. 


Proof: By Lemma 3 we may restrict our attention to feasible solutions with an 
integer number of drums at F. It suffices to show that 

ift<k, thend(x—t—1,t+1)>d(x—t,t) (6) 
and 

ift>k, thend(x —t+1,t—1) =>d(x —-t,t) (7) 


hold for positive integer t < x/2. 
In the remainder of this proof let m =|x]—t and f =x —|x]. In order to 
verify (6) we have: 


d(x —t—1,t+1)—d(x -t,t) 
=d(m—1+f,t+1)—d(m+f,t) 
1 1 
-i( 


2m 2n+2 


1 1 1 ; 
-(s-- sata) (8) 
Since t<k= I((x + 1)/2)”|, we have t+ 1<(x+ 1/2)” or |xJ= 


2(t + 1)? — 1. Thus 2m = 2x] — 4) = 202 +:« 2)? — 1) — 1) = Qt + YDQ2t + 2) 
or 


1 1 1 1 1 


— of <i, 
2m 22tt+1 $ 2w+2 2m  (2t+1)(2t +2) 


This and the fact that 
— — —— > 0 
2m 2m-+2 


imply that the right side of (8) is nonnegative, as required. 
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In order to verify (7) we have 
d(x —t+1,t—1)-—d(x -t,t) 
=d(m+1+f,t—1)—-—d(m+f,t) 


1 1 1 1 1 
> —f — + — + — 
2m + 2 eral 2m+2 2t-1 2t 
—2f 1 1 
i —_—————————_—_—_ 
(2m +2)(2m+4) 2m+2  (2t-—1)2t 
—2f 1 1 
> Ort Fo (9) 
(2m +2)(2m+2f+2) 2m+2  (2t-1)2t 


Since integer t >k = (x + N72)". we have that t > (x + 1/2)” 
or x <2t*-1. Thus 20m+/f)+2=2%x%—-t)+2< 2Qt?-1)-t)+2= 
(2t — 1)2t, and so 


1 1 
—_—__—- — ——————-_ > 0). 
2n+2f+2 (2t—-1)(2¢) 
Adding and subtracting 142m + 2) gives 
~2F 1 1 
fp 8 
(2m +2)\(2m+2f+2)  2mt+2 (2t-1)20 


That is, the right side of (9) is nonnegative as required. 
This completes the proof of Theorem 2. 


Theorem 2 can be viewed as providing the solution to the equivalent problem: 
Given unlimited fuel at each end of a desert of fixed length, minimize the amount 
of fuel required for a round trip across the desert. Since our solution gives a 
distance roughly one half of the harmonic number A yj ANY length desert can be 
crossed given the availability of sufficient fuel. It is also interesting to note that 
although our solution is better than placing half of the fuel at each end of the 
desert, the difference between the two solutions is not great. In fact it is bounded 
above by a small constant. It is easy to show, using common identities and 
estimates for harmonic numbers that this difference is always less than 1 + In2. 
Finally, we observe that the number of intermediate fuel depots for our optimal 
solution is [x] — 2. 

We next consider the problem of finding the position of two depots so that a 
desert of maximum length can be crossed on a round trip with n drums of fuel, n 
an integer, distributed between the two depots. Obviously one depot, B,, must be 
at the start; otherwise the jeep cannot move. At the second depot, B,, suppose 
that we have k units of fuel, where k is not necessarily an integer. Let r be the 
amount of fuel at B, which is used on the return trip and t =k —,r be the 
amount which is used to continue the trip across the desert. Without loss of 
generality we can assume that in an optimal solution on the return trip the jeep 
arrives at B, without fuel, for that fuel could be included in the k units stored at 
B, where k need not be an integer. Let s + f, be the amount of fuel at S, let 
r +f, be the amount of fuel at B, which is used on the return trip, and t + f, be 
the amount at B, used to continue the outbound trip where s, r,t are integers and 
0<f,<1 for i=1,2,3 such that s+f,+r+f,+t+/f, =n. The maximum 
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TABLE 2. Optimal partitions and distance for various values of n. 


Distance from 

n ) r t S to B, Desert Length 
3 1 1 1 1 15 

4 1 1 2 1 12 

5 2 1 2 14 2 

6 2 1 3 14 2: 

7 3 1 3 13 23 

8 3 1 4 13 23 

° ; ; 4 15 24 

10 4 1 5 133 2H 

11 5 1 5 1¢ 27 

12 5 2 5 12 22 
30 14 2 14 SHigt 3 Hutd% 
31 14 3 14 1Hyz+ 2% Hy+% 
width desert which can be crossed via a round trip is: 

Ao yt ty yp bt 
25+ 2 25 2s —2 2r+4 2r+2 2r+1 2r—1 
1 fs 1 1 1 1 
terre to tlt + — + terre bots. (10) 
3 2t+2 2t 2t-2 4 2 


This follows immediately from the fact that we can consider the problem as a 
round trip from B, to B, and a round trip onward from B,. 

Thus we need to decide how to partition n drums of fuel into 3 parts so as to 
maximize (10). It is easy to show, similar to Lemma 3, that for any feasible solution 
with value given by (10) there is a feasible solution at least as large where each f, 
is an integer. We state without proof Theorem 3 which gives, for arbitrary n, an 
optimal partition, the maximum desert length, and an optimal position for depot 
B,. Table 2, in which H,, denotes the nth harmonic number, shows these optimal 
partitions and distances for some sample values of n. We also observe that for an 
optimal solution, B, is placed slightly more than half way across the desert from 
B, = S. In order to state Theorem 3 we make two definitions: 

First define 


1 1 1 1 1 
d(s,r) =z + toes + i 
25S 2s —2 2r+2 2r—1 
1 1 1 
+14+—4----—4+-, 
2t 4 2 
Second given n,t, positive integers with n > t we define: 
D 1 1 1 1 1 
wt)=— + fovee + + a 
(Mt)= s+ 75 Ir+2. 2r—1 
1 1 1 1 1 
+1+—+ terre tof, 
2t 2t—2 4 2 


where s+r=n-—tandr= \((n —t + 1)/2)|. 
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Theorem 3. Zhe maximum width desert which can be crossed by a round trip using 
n >3 drums of fuel is D(n,t), where r is given above and s = |(n — r)/2], t = 
[n — r/2|. The depots B, and B, are located at S and at distance d(s,r) from S 
respectively. 


Observe that the definition of r is similar to that of k in the previous problem. 
The values of 7, s, and ¢t are defined by the above system of four equations, a 
solution to which can be found in any one of several ways. One easy way is to find 
r by Theorem 4 below. 

The proof of Theorem 3 is similar to, but more complicated than, that of 
Theorem 2. Any interested reader may obtain a copy of the proof from the third 
author. 


Theorem 4. /f 1 is a positive integer and m is the least positive integer such that 
4m? + 7m >n, then r =m. 


Proof: It suffices to show that for t =[(n — m/2)], if n = 4m?+7m or n= 
4(m — 1)? + 7m — 1) + 1, then m = |((n —tt+ 1) /2)'7|. Let n = 4m? + 7m, 
then t = 2m? + 3m and 


p= 
2 


Let n = 4m — 1)* + 7m — 1) 4+ 1 = 4m? — m — 2. Then t = 2m? — m — 1 and 


ey" - EY 


This proves Theorem 4. 
Finally from Theorem 4, it follows that r = ( —7 + (49 + 16n)” ) /8|. 


1/2 1\¥% 


*+2m+ — 
m m 7 


=m. 


1/2 


= mM. 


4. DEWDNEY’S PROBLEM. Suppose a jeep, which achieves 10 miles per gallon 
of fuel, can carry one 50 gallon drum of fuel in addition to at most 10 gallons in its 
tank. Dewdney [Dew] asked for an algorithm which maximizes the one-way 
distance the jeep can attain using m drums available at the start. 

It may appear that this problem is the same as, or at least very similar to, the 
one-way problem in Section 1. Dewdney’s problem, however, is somewhat more 
subtle. Change the units so that the jeep travels 1 unit of distance on one tank of 
fuel. One drum holds 5 tankfuls of fuel, and the jeep can carry one drum in 
addition to the fuel in its tank. Fuel can be stored in drums only. Thus at most 2 of 
the capacity of the jeep can be stored, whereas in Theorem A any fraction of the 
jeep’s capacity can be stored. Theorem A gives an upper bound for the Dewdney 
problem once an appropriate change of units is made. A somewhat complicated 
optimal algorithm for the Dwedney problem is given in [Jac]. This algorithm is 
optimal for all n, but attains the Theorem A bound only for small n. 

A friend, Ken Maddex, misunderstood a discussion with one of the authors 
regarding Dewdney’s problem. The Maddex problem: Given an unlimited fuel 
supply, but only n drums for carrying fuel and only one jeep of the Dewdney kind, 
maximize the distance into the desert that the jeep can attain. Of course, there are 
both one way and roundtrip variations of this problem. As with other jeep 
problems finding a travel algorithm is easy but deciding optimality is apparently 
not easy. 
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Finally we note that Brauer and Brauer [Bra] considered a problem similar to 
Dewdney with their jeep able to carry 1 drum and its tank able to hold 1 drum. 
They also added the constraint that the tank could refill only when it was empty. 
They developed a number of algorithms but did not prove any of them optimal 
except for very small n. 


The authors would like to thank Professor David Gale for many helpful suggestions which improved 
the exposition of this paper. 
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From The Autobiography of Malcolm X, 
Random House, Toronto, 1992, p. 29. 


I’m sorry to say that the subject I most 
disliked was mathematics. I have thought 
about it. I think the reason was that 
mathematics leaves no room for argument. 
If you made a mistake, that was all there 


was to it. 


—Malcolm X 


Contributed by Nicholas Buck 
College of New Caledonia 

Main Campus 3330, 22nd Avenuc 
Prince George, British Columbia 
V2N LP8 CANADA 
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Count-Wheels: A Mathematical Problem 
Arising in Horology 


Steven H. Weintraub 


In this article I will describe a brilliant invention made by an unknown medieval 
clockmaker. In inventing this device, the “subsidiary count-wheel’”’, this clockmaker 
solved a special case of an interesting mathematical problem, whose general 
solution I will present here. 


1. HISTORICAL BACKGROUND. The English word “clock” derives from the 
Latin word “clocca”, which means bell. (Compare the German word for bell, 
Glocke.) This etymology, surprising at first, is actually quite logical. The first 
mechanical clocks were so-called “tower clocks”, mounted in church towers and 
the towers of other tall buildings. While the very first clocks solely kept time, as 
soon as striking clocks were invented, sounding the hours (by ringing a bell) 
became the main function of clocks. To see why this is so, let us put ourselves in 
medieval Europe. 

For peasant farmers, there is little need to tell time. Their lives are governed by 
the position of the sun in the sky. Only with the development of trade did this 
become important. Consider two merchants. In order to transact business, they 
must arrange a time (and place) to meet. If they agree to meet at 10:00, say, they 
must know when 10:00 is. To find the time they consult the town clock, and in 
order to do so, they must be within its range. To maximize the range, the clock is 
located in a tower, but the range is still limited. However, once a chiming 
mechanism is installed, the range is vastly increased, for, as we all know, a clock 
can be heard to strike the hours over a much wider area than it can be seen from. 

Thus we can see that, originally, it was more important for clocks to announce 
the time than to keep it. Indeed, it is more accurate to think of early clocks as 
defining time rather than measuring it, and medieval clockmakers made much 
greater progress in developing striking trains; the mechanism that chimes the 
hours, than in developing going trains, the mechanism that keeps the time. 
(Subsequently, clockmaking became a race to develop more and more accurate 
going trains.) 

The very first striking clocks simply struck once each hour. The first striking 
clock which actually counted the hours was installed in campanile of the church of 
S. Gottardo in Milan in 1336. This clock counted the hours 1, 2,3,...,24 (note the 
use of the 24-hour clock). Thus part of the mechanism for this clock was a 
“‘count-wheel” to count the hours. 

This count-wheel is the large wheel schematically illustrated in Figure I, 
reprinted from [1, p. 26]. (Although the smaller wheel turns out to be our principal 
object of interest, we shall ignore it for the moment.) This wheel was mounted on 
the same axis as a wheel with (regularly spaced) gear teeth, which actually drove 
the striking mechanism. This system operated as follows: At midnight the wheel is 
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Figure I. A count wheel with subsidiary count wheel. The subsidiary wheel is mounted on the arbor 
that drives the principal count wheel. The number of strokes indexed on the ridges of the subsidiary 
wheel are marked. Indexing (on both wheels by the same indexing arm) takes place at the top; the 
position is after 18 hours has been struck 


positioned with a lever, which prevents it from turning, in the notch at the top. At 
1:00 the lever is raised, and the wheel rotates counter-clockwise until the lever falls 
into the next notch, stopping it. The distance between notches is such that one 
gear tooth engages the mechanism, causing the clock to strike once. At 2:00 a 
similar process occurs, but now the distance between notches is such that two gear 
teeth engage the mechanism, causing the clock to strike twice, etc. In the 
illustration the count-wheel is positioned ready to strike 19:00. (The observant 
reader will notice that notches for hours 1 through 4 are not individually cut, but 
rather there is a single long notch covering these positions. This omission will be 
explained below.) This quite straightforward method (which does not raise any 
mathematical questions) had, unfortunately, one severe drawback: it was very 
unreliable. To manufacture a count-wheel required making a wheel with 24 
irregularly spaced notches and another with 1+ 2+ --: +24 = 300 regularly 
spaced teeth, and this was a difficult task for 14th century metalworkers. If the 
count-wheel were not made precisely enough, it might, for example, stop after 
engaging one gear too few, ruining the striking pattern. Even a count-wheel which 
initially worked properly might eventually fail’due to the notches or teeth wearing 
down in service. Finally, there was a rapid development in the miniaturization of 
clocks, in which table clocks became widespread, and this exacerbated the prob- 
lem, for as hard as it was to construct large count-wheels for tower clocks, it was 
much harder to construct fine count-wheels for table clocks. 

This problem was solved by an unknown genius, who came up with the idea of 
supplementing the original, or primary, count-wheel, with a second, or subsidiary, 
count-wheel. (The method of operation of this subsidiary count-wheel is our main 
interest here, but so as not to interrupt the historical discussion, I will defer 
describing it.) Not only don’t we know who invented this, we don’t know when, 
either, only that it was already known in 1475. 

The subsidiary count-wheel was in use until well into the 16th century, but in 
the course of the 17th century this construction ceased to be used. (The most 
recently dated known example of a clock with a subsidiary count-wheel is from 
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1568.) The reason for this is two-fold. First, metal-working techniques had ad- 
vanced, and second, and most decisively, the use of the 12-hour clock (ie., 
counting the hours in a day 1,...,12,1,..., 12 instead of 1,...,24) had won out. A 
primary count-wheel for a 12-hour clock requires 12 irregularly spaced notches 
(rather than 24) to control a wheel with 1+ 2+ 3+.-:: +12 = 78 (rather than 
300) teeth, and these smaller numbers were within the capacity of then-current 
manufacturing techniques. 

Our earliest information on subsidiary count-wheels comes from Frater Paulus 
Almanus (Brother Paul the German). Frater Almanus was a German monk who 
made a pilgrimage to Rome in 1475. He stayed in Rome, supporting himself by 
opening a shop where he bought and sold clocks, and by maintaining and repairing 
clocks for various Church dignitaries. To help himself out, he kept a notebook in 
which he recorded various noteworthy features of clock mechanisms, and among 
these was the subsidiary count-wheel. He compiled this notebook (written in Latin, 
of course) between 1475 and 1485, and it found its way to a library in Augsburg, 
where it lay undiscovered until the 20th century. It was translated and deciphered 
in stages, and John H. Leopold and Phillip G. Coole were the first to uncover and 
understand the section on count-wheels (among others). This manuscript is pub- 
lished in English translation, with commentary and explanation, in [1]. 


2. OPERATION OF THE SUBSIDIARY COUNT-WHEEL. The subsidiary count- 
wheel is the small wheel illustrated in Figure I. It is mounted so that its top and 
the top of the primary count-wheel are adjacent. The lever, whose fall stopped the 
rotation of the primary count-wheel, was made twice as wide, so that it could fall 
only when both the primary and subsidiary count-wheels had rotated into a 
position in which they both had notches at the top. In this way the subsidiary 
count-wheel acted as a safety device, preventing the fall of the lever unless the 
right number of chimes had been sounded. 

Let us closely examine the operation of the subsidiary count-wheel. Note that it 
is irregularly divided, with notches spaced to allow turning through 1, 2, 3, 4, 3, and 
2 chiming gear teeth. At midnight it is positioned in the slot to the left of the 1. At 
1:00 the lever is raised, and the wheel rotates counter-clockwise 1 notch before the 
lever drops. At 2:00 it rotates 2 notches; at 3:00 it rotates 3; at 4:00 it rotates 4. At 
5:00 it rotates 3 + 2 = 5. (It is prevented from stopping after 3 chimes because the 
primary count-wheel is not in position.) Then 6:00 is counted by 1 + 2 + 3, and 
7:00 by 4 + 3. (Here we see why we could not simply count 5:00 by 5; a break is 
needed to enable counting 7:00.) Then 8:00 is counted by 2 + 1 + 2 + 3, 9:00 by 
4+3+2, 10:00 by 1+2+3+4, etc. Finally, after counting 24:00, the sub- 
sidiary count-wheel has returned to its starting position, having made 20 = 300/15 
complete revolutions, ready to count the hours in the next day. (This also explains 
why there were not individual notches cut on the primary count-wheel for the 
hours 1:00 to 4:00. The clock merely relied on the more reliable subsidiary 
count-wheel to count these.) 

By way of further clarification, let us consider Figure I, where the clock is 
poised to strike 19:00. When that time arrives, the count-wheels rotate in the 
indicated direction. The primary count-wheel (the larger one) rotates one notch, 
which indexes 19 strokes. The secondary count-wheel (the smaller one) rotates 
through the regions 4, 3, 2, 1, 2,3, 4, indexing a total of 19 strokes as well. In doing 
so it makes (in this case) somewhat more than a complete revolution; to be precise, 
it stops when it has turned through a complete revolution plus one additional 
notch. At 20:00 a similar process occurs. The primary count-wheel rotates one 
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notch, which indexes 20 strokes. The secondary count-wheel rotates through the 
regions 3,2, 1,2,3,4,3,2, indexing a total of 20 strokes as well. In doing so it 
makes (in this case) somewhat more than a complete revolution; to be precise, it 
stops when it has turned through a complete revolution plus two additional 
notches. At 21:00 a similar process occurs, and so on. 

Actually, there were several subsidiary count-wheels in use, but from a horologi- 
cal standpoint, this was the optimal one. Note that in any subsidiary count-wheel, 
the total number of gears counted in a complete revolution must be a divisor of 
300, the total number of gears counted by the primary count-wheel in its complete 
revolution (in a day), so that the subsidiary count-wheel will be correctly positioned 
at the start of a new day. Also, as a general rule, there is more tolerance in making 
a rapidly rotating wheel than a slowly rotating one, so this number should be small. 
Finally, since a primary count-wheel was most likely to be in error by one 
gear-width, the subsidiary count-wheel should have as small a proportion of one 
gear-width segments between notches as possible. Given these criteria, this design 
was best. 


3. THE MATHEMATICS OF COUNT-WHEELS. I will be dealing exclusively with 
schemes modelled on the subsidiary count-wheel, so I will drop the adjective 
subsidiary and simply describe them as count-wheels. 

I have long been interested in horology, and several years ago I revisited the 
British Museum clock room (a must for anyone interested in the subject). There I 
saw a display describing the operation of this count-wheel. I quickly realized that 
not only could this count-wheel count the integers 1 through 24, it could count all 
positive integers! 

Since I am a mathematician, the general question arose in my mind. I will 
denote the above mentioned count-wheel by (1, 2, 3, 4,3, 2) and call it a count-wheel 
of weight 15 =1+2+3+4+4+3 + 2 and length 6 (since there are six numbers 
in it). Define a count-wheel to be a count-wheel which can count all positive 
integers. My question was: Does there exist a count-wheel of weight w, for every 
positive integer w? 

This question is actually not a very interesting one, since a moment’s reflection 
reveals that, for any w, (1,1,1,...,1) (w entries) is a count-wheel. We could ask 
whether there exists a shortest (i.e., having smallest length) count-wheel of weight 
w, but this is again uninteresting, since the lengths of count-wheels of weight w are 
positive integers, so among them is a smallest one. However, we can refine this a 
bit further to get some questions of real mathematical interest: 


Question 1. Given a positive integer w, does there exist a unique shortest count- 
wheel of weight w? 


Question 2. If so, how can we construct it? 
Question 3. If so, what is its length? 

In this paper, we will precisely define count-wheels, and answer these questions. 
(The answer to question 1 is yes, for every w.) We will merely give the results here. 


For the proofs we refer to the reader to our paper [2]. 


Definition 1. A sequence A = (a,,...,a,) of positive integers is a count-wheel of 
length n and weight w =a,+-:: +4, if it has the following property: Let A be the 
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infinite sequence (G;);1,..- = (@1,...,Qn,@1,.-+,@,,...). Then there exists a se- 
quence 0 = i(0) < i(1) < i(2) < ++: such that for every positive integer k, 

i(k) 

LD Oa, =k. (*) 


i=i(k—-—1)+1 


Definition 2. A count-wheel B = (b,,...,b,,) is an amalgam of a count-wheel 
A =(a,,...,a,) if B # A and 
b, =a,=1 


by =a, + °°* +4) 
b; = Aia41 t°* FAjic3) 


Din = Gie¢m—1)+1 tore: +a, 


A count-wheel A is called reduced if it has no amalgams. 


For example, (1, 2, 3, 4,3, 2) is an amalgam of (1, 2, 3, 2, 2, 3, 2). 
Our first result 1s: 


Theorem 3. For every positive integer w, there is a unique reduced count-wheel of 
weight w, denoted [w]. Also, [w] is an amalgam of any other count-wheel of weight 
Ww. 


Note that this result gives a positive answer to question 1, as [w], being an 
amalgam of every other count-wheel of weight w, is certainly the shortest count- 
wheel of weight w. (Granting that (1, 2,3, 4, 3,2) is a count-wheel, our description 
of its operation in the previous section shows that all of the notches in it are 
necessary, i.e., that it is reduced.) 

Now we proceed to give an algorithm for producing [w], answering Question 2. 
In the following definition, tB denotes the t-fold repetition of B, e.g., 2(1, 2,2) = 
(1, 2,2, 1, 2,2). Note that if B is a count-wheel, so is tB, for any positive integer t¢. 


Definition 4. A count-wheel A is primitive if the equation A = tB only has the 
solution t= 1, B=A. 


Our second result is: 


Theorem 5. (a) The reduced count-wheel [w] is primitive if w is odd. (b) If w = 2'v, 
t > 0, then [w] = 2'[v]. 


(From part (b) we see that in fact we have if and only if in part (a). Clearly, 
[1] = (1), so from part (b) we also see that for any ¢t > 0, [2] = (,1,1,..., 1.) 

Note that this theorem reduces our problem to that of constructing [w] for w 
odd. 


Theorem 6. Let w be odd. The following algorithm produces a reduced count-wheel 
of weight w: 

Index the positions on a wheel 1,...,w clockwise and place a 1 in each position. 
Begin with a pointer between positions w and 1, and cut a notch there. 

Step 1: For k = 1,...,(w — 1)/2, successively rotate the pointer k positions 
clockwise and cut a notch at the point where the pointer stops, if there is not one there 
already. 
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Step 2: Begin with the empty sequence and the pointer positioned at the notch 
between positions w and 1. Rotate the pointer clockwise until a notch is reached, and 
let the next term of the sequence be the number of 1’s the pointer has passed in doing 
so. Do this until the pointer has returned to its original position. 


Note that Step 2 is merely counting the result of Step 1, which is the heart of 
the algorithm. Note also the real content of Step 1. This step merely mimics 
counting the hours, so if it read “For k = 1,--:” it would be obvious. But if it 
read so, it would not be an algorithm, for k would be ranging over the infinite set 
of positive integers. Of course, there are only a finite number of notches in the 
wheel, so after some integer the pointer keeps falling into already cut notches. 
However, in order to have an algorithm, we must know what that integer is, and 
that is precisely what the theorem tells us. (In other words, it says that a 
count-wheel which counts 0,1,...,(w — 1)/2, for w odd, counts forever.) Note 
also that this theorem is in general sharp, in that we must start at 0 and count up 
to (w — 1)/2, and not stop sooner. (In particular, this is always the case whenever 
w iS prime, as we see from Theorem 7(b) below.) Application of this algorithm 
yields [15] = (1, 2, 3, 4,3, 2), recovering the result of our unknown medieval clock- 
maker. 

Let us illustrate this algorithm with a further example. We shall compute [35]. 
To save space, however, we will not write a large wheel, but rather work “linearly’’. 
Further, to avoid eye-strain we will work by “breaking down” 35 rather than by 
“building up” from a string of 35 1’s. (The reader should have no trouble seeing 
that this procedure is logically equivalent to that of Theorem 6.) We denote the 
current pointer position with a slash and already existing divisions with a comma, 
and we drop the leading comma. Our algorithm says we must consider k = 
0,1,2,...,@5 — 1)/2 = 17, so it will have 18 steps. They are 


/35 > 1/34 > 1,2/32 > 1,2,3/29 — 1,2,3,4/25 

> 1,2,3,4,5/20 > 1,2,3,4,5,6/14 — 1,2,3,4,5,6,7/7 

+ 1/2,3,4,5,6,7,7 > 1,2,3,4/5,6,7,7 

> 1,2,3,4,5,5/1, 7,7 — 1, 2,3, 4,5, 5,1, 7,3/4 

> 1,2,3,2/2,5,5,1,7,3,4 — 1, 2,3, 2,2,5,5, 1/7, 3,4 

> /1,2,3,2,2,5,5,1,7,3,4 > 1,2,3,2,2,5/5,1,7,3,4 

> 1,2,3,2,2,5,5,1,7,3/4 > 1,2,3, 2,2,3/2, 5,1, 7, 3,4. 
Thus we obtain [35] = (1, 2, 3, 2, 2, 3, 2,5, 1, 7, 3, 4). 


We now dispose of the question of the length of reduced count-wheels, 
answering Question 3. We let A(w) denote the length of [w]. 


Theorem 7. (a) A(2’) = 2° for all t => 0. 
(b) If w is odd, A(w) < (w + 1)/2, with equality if and only if w is 
prime. 
(c) If v and w are relatively prime, (vw) = A(v)A(w). 
(d) For an odd prime p, and any t > 1, 


2t 
2t—1 p* —1 
~~ 44 
Mp") Ap+l). ” 
p(p” - 1) 
MP) = Gat tt 
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For the edification of the reader, we give the values of [w] for various w below. 


To explain 


our notation, given that [5] = (1, 2, 2), 


and that [25] = 


(1, 2,2, 1, 4, 1, 4, 1, 4, 1, 4), we shall write [25] = (5], 41, 4)). We then have 


[23] = (1, 2, 2, 1,3, 1,3, 2,5, 1, 1, 1) 


(3] = (1,2) [21] = (1, 2,3, 1, 3, 3, 2, 6) 
[5] = (1,2, 2) 

[7] = (4, 2,3, D [25] = ((5], 4, 4) 

[9] = 3], 23)) [27] = (2[9], 3(3)) 


[11] = (1,2, 1,2,4, 1) 

[13] = (1, 1,1, 3,2, 2,3) 

[15] = (1,2, 3, 4,3, 2) 

(17] = (1, 1,1, 1,2, 4,1, 4, 2) 
[19] = (1, 1, 1,3, 1,2, 1,5, 2, 2) 


[45] = (1, 2, 3, 4, 5, 3, 3, 7, 2, 3, 3,9) 
[49] = ({7], 6(1, 2, 4)) 
[81] = ((27], 2((9], 6(3))) 
[121] = (1, 2, 3, 4, 1, [11], 9G, 2, 3, 4, 1) 
[125] = ((25], 2((25], 5(1, 4))) 


As we have mentioned, the proof of these, and several other results on this 
subject, can be found in [2]. The methods used in these proofs are elementary. 
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From Trotsky a Documentary, by Francis 
Wyndham and David King, Penguin, 
London, 1972, p. 108. 


Logical arguments, even if Russell turns 
them into mathematical formulae, are im- 
potent against material interests. The rul- 
ing classes will let civilization perish 


together with mathematics rather than 
give up their privileges... 

—Trotsky 

Contributed by Nicholas Buck 
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How to Teach a Class by the Modified 
Moore Method 


Donald R. Chalice 


1. INTRODUCTION. Following is a description of the ‘Modified Moore Method” 
that I have used successfully in classes ranging from intermediate analysis to 
advanced calculus to measure theory. While using this method, I have been able to 
cover as much material (and in a few cases more material) as in the usual 
lecture-style course. More importantly, with the Modified Moore Method, the 
students and I have covered that material in a far more enlivening, enjoyable and 
intellectually stimulating way. 

I have used the method with average students in average classes and with 
exceptional students in above-average classes and have found similar success. Over 
the past twenty years, I have found it much superior to simply lecturing, and if you 
try it I hope you will too. 

Places where changes have been introduced to the usual Moore method are 
indicated by a “*”’’. For further illustration see the sample of class notes in the 
appendix. 


TEACHING A CLASS WITH THE MODIFIED MOORE METHOD 


(I) The first day. (1) The class is begun by handing out a set of notes containing 
the material to be presented. Each section of the notes begins with a set of 
definitions. 

*If your class is the students’ first exposure to “theorem proving’’, then begin 
your notes with a list of techniques on theorem proving from Pélya’s books [6, 7]. 
Use also the list from Chapter 1 in Loren Larson’s text [4]. 

* After the definitions are listed in the notes, you need to follow these by a short 
section entitled “Exercises on the Definitions”. This section consists of a set of 
elementary examples and exercises about the meanings of the definitions and is 
geared to help the average student understand the definitions as applied to simple 
examples. (See the appendix.) 

After the ‘‘Exercises on the Definitions” section in the notes, make a list of 
theorems and conjectures. *Most of these theorems are true but some “red- 
herrings” will be useful. For example, in an introductory course such as [1], some 
students try to prove inductively that the union of a countable collection of closed 
sets is closed, thus it becomes obvious that they do not understand induction, and 
the discussion ensuing after their presentation is helpful to them. (*In more 
advanced courses, you may put two, one or no “*’’’s preceeding various theorems 
to indicate their difficulty. Surprisingly, students prefer to do theorems with a “‘*” 
than without.) *Very difficult theorems that you alone will prove will be preceeded 
by an “Q)” or “***”, 

(2) Next present the class rules. The rules J present are: 

(i) “You cannot talk to anyone, (not even your wife or girl-friend or husband or 
boy-friend) about the proofs of the theorems until they are done in class. You can 
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however, come to see me and talk about your proofs during office hours or by 
appointment.” 

(ii) “You cannot look at any textbooks pertaining to the course.’ You cannot 
look at any other set of class notes or solutions.” 

*@ii) “A list of proved theorems will be placed on reserve in the library after 
they are done in class.” 

*(iv) “The quarter/semester will be divided into three periods. *An exam will be 
given at the end of each period on the theorems that have been proved in that 
period. You will receive a grade at the end of each of these periods to indicate 
how well you are doing.” 

*(v) Generally, you (the instructor) need to send three people to the board at 
once to write up their proofs. When they are finished with their write up, then 
send each in turn up to the board to present his or her proof to the class. 

*(vi) Encourage mistakes—up to a point. ““Do not be afraid to make mistakes 
on the board. If I were lecturing, 1 would only show you the ‘right’ way to do a 
problem or proof. But those who make mistakes are also making a contribution in 
that they are showing us the limits of the proof or problem. But there will be a five 
minute time limit on mistakes. Generally, you (the student) should not try to patch 
a proof at the board if a mistake is found. Rather, if nothing comes in your allotted 
five minutes, make a graceful exit, perhaps saying something like ‘Well, I need to 
think about this some more.’ ”’ 


(II) The second day. *Send at least 5 students to the board at once to write up 
their answers to the exercises on the definitions. *Thus all students are writing up 
their solutions at once. *Often assign the same exercise to more than one student 
to get an “alternative point of view’. 


(II) The third day. Students begin presenting their proofs. 

*(1) Send them to the board in waves of three to write up their proofs together. 
Then after all proofs are written up, send each back to the board in turn to present 
their proof to the class. 

(2) Try to make the class atmosphere safe for thoughtful expression, partly, by 
taking the point of view that we are all working to help and encourage one-another 
in the endeavor of the class. Encourage a helpful attitude rather than cutting 
one-another down. Always let a proof or attempted proof be first presented 
without interruption, then look around the room to see whether understanding 
prevails. When I see a glazed look on some people, I ask if they understand the 
proof or problem. Sometimes, especially at the beginning they may have trouble 
phrasing a question. *I say “You can always ask for an ‘instant replay’.”’ So often, 
proofs are “replayed.’’ During the repeat presentation, I consider myself something 
of a conductor of the class as I can see how the presentation may be made more 
clear, and often after this second presentation I will make suggestions on where 
and how to simplify a proof or problem to make it more elegant and to improve its 
presentation at the board. (A lecturer, unfortunately, has no “conductor” to direct 
him toward making a clearer presentation!) 

I now present some reasons for the modifications made above in order of 
importance. The headings refer to the particular day in class. 


‘Usually do not relax this condition. But in more advanced courses, if textbooks are allowed then I 
ask for a reference if the student uses one. 
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Rationale for the Modifications 

*TII-1. It is very important that for each round of proofs that you send three 
students to the board together to write up their solutions at once. This simple 
device alone enables the course to go nearly as fast and sometimes faster than the 
corresponding lecture style course. And this way up to five or six proofs may be 
presented per class-hour. 

*I-2. Exercises on the definitions. Especially for average students, this section is 
indispensable and must be included to prevent much stumbling that would other- 
wise occur. It also helps the students warm up, especially at the beginning and is 
the basic device that allows us to use the Modified Moore-Method, in the first 
place, with students of average undergraduate calibre. For exercises be sure to 
send 5 to 7 students to the board at once. 

*]-3(iii). Documenting the solutions in the library frees students (if they wish) 
from taking notes and allows them to pay full attention to the proof under 
discussion. However, many will take notes anyway. Additionally, your (the instruc- 
tor’s) proofs on reserve are of course, usually more concise and easier for them to 
learn for the exams later. 

*]-3(iv). Grading Periods. The three grading periods are very useful in that 
students will perform better and more often if you give them an evaluation 
deadline. An exam given at the end of each grading period on the proofs already 
presented then forces them to review the material so that they will be able to 
proceed to the next level. Give the student a grade and an evaluation at the end of 
each period. 

Thus, with notes and exams, you should find that the above method teaches the 
required proofs as well, if not better than a standard lecture style class, but with 
far greater enjoyment on your and the students’ parts and with more enhancement 
of the students’ interaction and creativity. 


REQUIREMENTS OF THE MODIFIED MOORE-METHOD 

(1) Class size. In general, class size should be limited to a maximum of 24, with 
ideal size about 14-18. (if your university is committed to quality of instruction, 
then class sizes will be appropriately small since student to instructor ratio is one 
such measure of quality.) 

(2) Office hours. It is very important for the proper functioning of the class that 
the instructor be there at his or her office hour. (This might be viewed partly as 
preparation for the class.) So an equal number of office hours as class hours is 
appropriate. 

(3) More class hours makes the class work better. E.g., 4 hours per week seems 
to be the best number for most undergraduate classes, such as advanced calculus 
and linear algebra. Graduate classes work well with the usual 3 hour class; but 
some, such as Lebesque integration, will work better if 4 class hours are allotted 
per week. 


CONCLUSION. I feel that a part (and possibly a fairly large part) of a student’s 
education should be exposure to courses taught by this Socratic type method. 
Partly, student reactions to this method convince me so. For example, when I 
encounter former graduate students who have jobs in the “real world” and ask 
them what courses they felt helped them the most, often I will get the reply that 
one of the Modified-Moore-Method courses did because in it they had to learn to 
express their own ideas convincingly and forcefully to a large group. 
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CLOSING AND AN INVITATION. The cornerstone to this Socratic type method 
of teaching is the ‘‘enjoyment of Mathematics” and “class participation” and I 
hope that some of you reading this article would be encouraged enough to try 
using this method for yourself. If I can be of any assistance with questions you 
might have concerning the method or with notes from my class, I welcome and 
encourage your queries. 


APPENDIX—(From [1]) 
Chapter II. Open Sets and Closed Sets. 
PART A: Definitions. 


PART B: * Exercises to help with the definitions (samples follow). 

2.3. Draw: a neighborhood of p; an e-neighborhood of p; a deleted neighbor- 
hood of p. 

2.4. Tell whether the following sets are open and why. (A list of open sets to 
check.) 

2.5—2.6. Which of the following sets are open? Which are closed? Neither? 
(examples) (By standard English usage some students may think “‘open” is oppo- 
site to “closed’’.) 

2.7. Which of the points p, g,r are cluster points of the following sets and why? 
(examples) 

2.8. Using the quantifiers V or 4, symbolize the definitions for each of the 
following: 

a) O is an open set. a’) O is not an open set. etc. 


Prove or Disprove (List of theorems. Samples follow.) 
(M4) v 2) - () (4; U Bp. 
1 1 1 


8. Prove or disprove #7 if A, DA, > ---: and B, >B,> --- are nested. 

32. The union of any collection of open sets is open. 

33. The union of any collection of closed sets is closed. (They tend to try to 
prove this inductively.) 


Chapter III. Connectedness. 
PART A: Definitions. 


PART B: *Exercises on the definitions. 

3.1. Draw: a) two disjoint sets. b) two mutually separated sets. c) two disjoint 
sets that are not mutually separated. 

3.2. Which of the sets S below are relatively open and which are relatively 
closed in the semicircle {(x, y)|x* + y? <1 and y = 0}? (list of examples), etc. 

3.4. (Do 3.4 and 3.5 on the board together.) Which of the following sets A, B 
are mutually separated? A pictorial list of examples like: A = N(0,1), B= 
dN(2, 1) ~ {(2, 1), (2, — 1}. 

3.5. Which of the sets in 3.4 are a union of two mutually separated sets? Which 
of the above sets are connected? 

3.7. In your own words, what is a component of a set? Prove (samples follow): 
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49. Rose Leaf Theorem. 
50. If p is an element of X then there exists a component of X that con- 
tains p.) 
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‘Many false proofs of this either resemble St. Thomas Aquainas’ proof of the existence of the 
“Prime Mover” or St. Anselm’s proof of the existence of “That Than Which Nothing Greater Can Be 
Conceived.” See [5]. 

*There is a connected set X in the plane such that if you remove a certain point p from it, then 
X ~ {p} is “totally disconnected,” i.e., the only components of X ~ {p} are single points. [2, 3] The 
point p is called an “explosion point”. There are three disjoint connected open sets in the plane with 
the same boundary. [2] Such sets at first glance seem impossible to draw. Perhaps such an “undrawable” 
set is a counterexample to the statement! 


Mathematics is not a careful march down a well-clearced 
highway, but a journey into a strange wilderness, where 
the explorers often get lost. Rigour should be a signal 


to the historian that the maps have been made, and the 
real explorers have gone elsewhere. 


—W. §. Anglin 
‘‘Mathematics and History,” Mathematical Intelligencer, 4, #4. 
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The Significant-Digit Phenomenon 


Theodore P. Hill 


It has been frequently observed that in many tables of physical constants and 
statistical data, the leading digit is not uniformly distributed among the digits 
{1,2,...,9} as might be expected; rather the lower digits appear much more 
frequently than the higher ones. Perhaps even more surprising, an exact distribu- 
tion for this nonuniformity of the leading digits has been generally asserted. In 
1881 Simon Newcomb [9] stated that “The law of probability of the occurrence of 
numbers is such that the mantissae of their logarithms are equally probable,” and 
concluded that 


Prob (first significant digit = d) = log,,(1 + d~'), d=1,2,...,9. (1) 


(For example, (1) predicts that the leading digit is 1 with probability about .301, 
and at the other extreme, is 9 with probability .046.) 

Although Newcomb offered no statistical evidence for (1), its rediscovery by the 
physicist Benford [2] some fifty-seven years later was supported by empirical 
evidence based on frequencies of significant digits from twenty different tables 
including such diverse data as surface areas of 335 rivers, specific heat of thou- 
sands of chemical compounds, and square-root tables. The union of his tables 
comes surprisingly close to the frequencies predicted in (1), and, Newcomb’s 
earlier paper having been overlooked, those frequencies came to be known as 
Benford’s Law, or the First Digit Law, In fact, Benford’s data not only comes 
surprisingly close, it comes suspiciously close to the predicted frequencies; Diaconis 
and Freedman [5, p. 363] offer convincing evidence that Benford manipulated the 
round-off errors to obtain an even better fit. But even the unmanipulated data 
seems a remarkably good fit, and the “law”? has become widely accepted. 


CLASSICAL EXPLANATIONS. Since Benford, numerous “mathematicians, 
statisticians, economists, engineers, physicists and amateurs” [11, p. 521] have 
attempted to explain the probabilities appearing in (1) based on a variety of 
hypotheses. The classical explanations include: the usual number-theoretic (or 
Cesaro) method for assigning densities to the sets in question; continuous analogs 
of the Cesaro method based on integration techniques; various probabilistic 
urn-schemes; demonstrations based on assumptions of continuity and scale-invari- 
ance (see below); and statistical descriptive arguments. For an excellent review of 
these ideas, the reader is referred to Raimi [11]. (A more recent explanation of 
Schatte [12] gives Benford’s Law as a corollary to an “unproved” ((12, p. 452]) 
“hypothesis that after a sufficiently long computation in floating-point arithmetic, 
the occurring mantissas have a nearly logarithmic distribution.’’) 

All of these previous explanations suffer from two substantial shortcomings. 
First, the previous methods for prescribing frequencies for such sets as “first 
significant digit = 1” are not unique. Such a set does not have a natural density, 
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unlike the set of even numbers, say, which has density 1/2 among the integers and 
density 0 among the real numbers, and in general there are many ways of assigning 
a number to the set “first significant digit = d” which are consistent with natural 
density. The explanations mentioned above simply single out particular summation 
or integration techniques that yield the ‘“‘correct’’ Benford frequencies. 

The second shortcoming is that, terminology notwithstanding, the past fre- 
quency-assigning functions leading to (1) are not probabilities, at least not in the 
classical sense. The standard mathematical definition of probability is a [0, 1]-val- 
ued function P on a domain of sets (called a sigma algebra) closed under 
complements and countable unions, which assigns 1 to the whole set and assigns 
measure ©” _, P(A,) to the set U*_, A, if the {A,} are disjoint. But the methods 
above necessarily fail to satisfy these conditions, as will, for example, any reason- 
able notion of density on the natural numbers which assigns density 0 to singletons, 
for then P(N) = ©” P(n}) = 0 # 1. (This is exactly the same reason for the 
foundational difficulty in making rigorous sense of “pick an integer at random”; 
e.g., see De Finetti [4] pages 86, 98-99). For the integer-based models of Benford’s 
Law, this difficulty seems insurmountable, and for the above-mentioned real- 
number models either a precise domain for the probability in (1) was not specified 
by Newcomb et al., or when specified was simply not the appropriate collection %. 


THE PROPER PROBABILITY DOMAIN. The first step toward making rigorous 
sense of the First-Digit Law (1) is to identify an appropriate domain for the 
probability. A typical set in the desired collection &Y of subsets of R* is the set of 
positive reals whose first significant digit (base 10) is 1, namely, 


oe 


{D,=1 = U [1,2)- 10". 


n=—e 


This set (along with its analogs from the second, and general nth-digit laws, also 
known to Newcomb and Benford) suggests the following natural domain .& for a 
general significant-digit law. 


Definition. ./ is the smallest collection of subsets of the positive reals which 
contains all sets of the form U®__a,b)- 10”, and which is closed under 


complements and countable unions. 
The following properties of ./ are easy to check: 


every non-empty set in L is infinite, with accumulation points at 0 and at +; 
SL is closed under scalar multiplication, i.e, a > 0 and S € G> aS € YW; 
A is self-similar, in the sense that if S € Wand k € Z then 10*S = S. 


For each i = 1,2,..., let D,;; R*— {0,1,...,9} be the ith significant-digit func- 
tion, for example, D,(7) = 3, D,(a7) = 1 = D,(107r). It may easily be shown [8] 
that 


D;'({d}) € & for alli and d, 


and in fact, M is the smallest such collection (closed under complements and 
countable unions) for which this is true. (In measure-theoretic terms, © is the 
sigma-algebra generated by D,, D,,...) This shows that & is precisely the 
correct domain for a general significant-digit probability law. 
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THE GENERAL SIGNIFICANT-DIGIT LAW 


General Significant-Digit Law [8]. For all k EN, all d, € {1,2,...,9} and all 
d, € {0,1,2,...,9, 7 =2,...,k, 


k —] 
P yd; ° 10% . (2) 
i=1 


n {D; = ‘| = lea + 


Observe that this joint significant-digit law (2) includes the First-Digit Law (1) as a 
special case, as well as the other marginal significant-digit laws. 


Example. 


P((D,, Dz, D3) = (3,1,4)) = lo | = 0014. 


+ —_—_ 
314 
A perhaps surprising corollary of (2) is that 
the significant digits are dependent 


and not independent as one might expect. For example, from (2) it follows that the 
(unconditional) probability that the second digit is 2 is = .109, but the (condi- 
tional) probability the second digit is 2, given that the first digit is 1, is = .115. 
Similarly, the hundredth significant-digit is also dependent on the first few signifi- 
cant digits, although the dependency decreases as distance between the digits 
increases. It also follows easily from (2) that the distribution of the ith significant 
digit approaches the uniform distribution (where each digit {0,1,...,9} occurs with 
frequency =) exponentially fast as i > ©. 
What simple hypotheses lead to the General Significant-Digit Law (2)? 


SCALE AND BASE-INVARIANCE. One set of hypotheses which has been popular 
in the past is the notion of scale-invariance, which corresponds to the following 
idea. If the first digits obey some fixed universal distributional law, then this law 
should be independent of the units chosen (e.g., English or metric systems). 
However, as Knuth pointed out (cf. Raimi [11]), there is no scale-invariant probabil- 
ity measure on the Borel subsets of R*, since then the measure of the set (0, 1) must 
be the same as the measure of every interval (0, b), which by countable additivity 
must be 0. 

The problem is simply that the Borel sets (the smallest sigma-algebra containing 
all open intervals) are not the appropriate domain for the significant-digit proba- 
bility law; using 7 instead resolves this problem. 

On &%, it is easily shown [8] (since the orbit of every point under irrational 
rotation on the circle is asymptotically uniformly distributed) that if P is scale- 
invariant, i.e., if P(bS) = PCS) for all b > 0 and all S € XM, then P satisfies (2). 
That is, on the correct domain £7, 


scale-invariance implies Benford’s Law. 


One possible drawback to the scale-invariance hypothesis is the special role 
played by the constant 1. In most tables of physical constants, the constant 1 simply 
does not appear, since the underlying law (say, in f = ma) does not necessitate 
definition of a constant (as opposed to e = mC’). If a “complete” table of physical 
constants included the constant 1, perhaps that special constant would occur with 
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strictly positive frequency. But this would preclude scale-invariance, since then 
0 < P({1}) = P({2}) = ..., contradicting the additivity of a probability. 

As an alternative hypothesis, suppose that any universal significant-digit law 
were base-invariant; i.e., carried over to bases other than 10. (As pointed out in 
[11], all the classical arguments supporting (1) and (2) carry over mutatis mutandis 
to other bases such as 2, or 7 or 100.) 

To motivate a formal definition of base-invariance, consider the set of positive 
numbers S with first significant digit (base 10) less than 5. Using the decimal 
notation D, as above, and letting D{” denote the first significant digit base 100, 
it is easily seen that 


S={1<D, <5} ={1 < DY <5} u {10 < DY < 50}, 


which says that graphically (as a subset of [1, b)), the same set S is 


3 if b = 10 
1 b? b 
and is 
if b = 100, 
1 pa/2 pi/2 pata/2 4 


(where a = log,,5). Hence if a probability P on & is “base-invariant,” the 
measures of these two S-representing subsets of [1,b) should be the same, i.e., 


P([1,b*)) = P([1, 6%) + P([bY, 6% **”2)), 


and similarly for higher power bases b”. This suggests the following definition. 


Definition. [8] P is base-invariant on PL if 


n—1 
P([1,107]) = }) P[10*”,10%+°”) for all n © Nand all a € (0,1). 
k=0 


Letting P, be the logarithmic probability defined in (2) and P, be the degenerate 
probability which assigns mass 1 to the constant 1 (or formally, to the set 
U*__ {10} in 27), it now follows [8] using a slightly deeper result from ergodic 
theory concerning invariant measures on the circle, that 


P is base-invariant << P= qP,+(1-—4q)P, forsomegq € [0,1]. 
Corollaries are: 
the logarithmic distribution (2) is the unique continuous base-invariant distribution 


and 


scale-invariance implies base-invariance. 


(Observe that base-invariance does not imply scale-invariance, since P, is base but 
not scale-invariant.) Thus, if there is a universal significant-digit law and it is 
base-invariant, then the special constant 1 occurs with possibly positive probability 
q, and otherwise (with probability 1 — q) the digits satisfy the logarithmic distribu- 
tion (2). 
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APPLICATIONS 


Computer design and analysis of roundoff errors. Hamming [6] has given applica- 
tions of Benford’s Law to the problem of placing the decimal (binary) point in the 
number system of a computer in order to minimize the number of normalization 
shifts after the computation of a product, to the problem of estimation of the 
representation error of numbers in base 2 and base 16, and to the problem of 
roundoff error propagation. Schatte [12] similarly concludes that the choice of a 
binary-power base b = 2’ can be guided by the hypothesis of logarithmic distribu- 
tion (cf. Benford’s Law) of mantissa errors; for example, he argues that base 
b = 2° is optimal with respect to storage use. 


Statistical Tests for “Naturalness.”” Varian [13] has proposed using Benford’s Law 
as a test of “reasonableness” for data, by checking forecasts of a mathematical 
model as to goodness of fit to Benford’s Law. He used this idea to check specific 
models for economic production and for forecasts of acres of land in various use, 
and Becker [1] used Benford’s Law to check lists of failure rates to detect 
systematic errors. The underlying idea in these applications is that if “real life 
data” obeys Benford’s Law, then so should good mathematical models. 


Making Money in Numbers Games. In the Massachusetts Numbers Game l[cf. 3], 
players first bet on a four-digit number of their choice, next a single four-digit 
number is generated randomly by an umpire, and then all players with the winning 
number share the (tax-reduced) pot equally. In such a situation it is obviously 
advantageous to identify numbers which few people choose, since all numbers are 
equally likely to be winners and the expected payoff for an unpopular number is 
thus higher than that for a number which many people have chosen. Now if 
people choose numbers from their experience, and if the numbers in their 
experience obey Benford’s Law, then it makes sense to pick numbers inversely to 
Benford’s Law, i.e., numbers starting with 9 or 8. Of Chernoff’s [3] 33 statistically- 
obtained numbers in his “first system” (numbers with predicted normalized 
payoffs exceeding 1.0) for playing the Massachussets Numbers Game, 16 had first 
significant digit 8 or 9, and only 1 has first significant digit 1 or 2. (Additional 
evidence that numbers “randomly” generated by people tend to start with low 
digits is found in Hill [7].) Since Chernoff also concluded that the public learns 
quickly, this suggests using inverse-Benford as an initial strategy when a new 
numbers game is initiated, and then quitting play soon thereafter. 


Outfoxing the Internal Revenue Service. In his Ph.D. thesis, Nigrini [10] has sug- 
gested that the IRS use Benford’s Law as a test for detecting fraud, such as 
falsification of data by a taxpayer at the time of filing his return. Nigrini’s 
hypothesis is that true data gives a rough approximation to Benford’s Law, whereas 
a Benford-ignorant cheater tends to concoct numbers according to some other 
distribution, say uniform via a standard random number generator, or more likely, 
a subconscious personal favorite generated mentally. Nigrini proposes that the IRS 
simply check for goodness-of-fit against Benford, and then audit the worst fits. 
This suggests that a “creative” and Benford-wise taxpayer should modify or 
generate his fabricated data according to a Benford-like distribution. 


ACKNOWLEDGMENT. The author is grateful to Professors Bob Foley and Ron Fox for several useful 
suggestions and Goran Hognas for pointing out the self-similarity of A. 
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“...in the current state of analysis we 
may regard the discussion [of past mathe- 
matics] as tasteless, for they concern for- 
gotten methods, which have given way to 
other more simple and more general. 
However, such discussions may yet retain 


some interest for those who like to follow 
step by step the progress of analysis, and 
to see how simple and genereal methods 
are born from particular questions and 
complicated and indirect procedures.” 


—J, L. Lagrange 
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Exploring the Brachistochrone Problem 


LaDawn Haws and Terry Kiser 


1. MAKING THE BRACHISTOCHRONE ACCESSIBLE. In light of the attention 
given to a national crisis in mathematics education, concerned mathematics in- 
structors are always looking for innovative ways to present and reinforce ideas. For 
a generation that grew up with fast paced MTV and special effects movies like Star 
Wars, the classroom may appear to be a fairly dull environment with uncompro- 
mising standards. Computer technology can help educators compete for students’ 
attention and at the same time enhance the learning process by 


1) bringing an added dimension—visualization—to the presentation of mathe- 
matical concepts, 

2) giving students greater flexibility to explore and discover ideas on their own, 

3) making more advanced topics accessible to a wider range of classes. 


These learning aspects will be discussed in the context of some Mathematica 
packages for exploring the classic Brachistochrone problem and interesting varia- 
tions. 

The Brachistochrone Problem, to find the curve joining two points along which 
a frictionless bead will descend in minimal time, is typically introduced in an 
advanced course on the Calculus of Variations. The statement of this problem is 
easily understood, even for high school students, when phrased in a more familiar 
context as follows: ““‘What shape should a roller coaster track have so the car will 
travel from a high point A to a low point B as fast as possible?” This form of the 
statement of the problem, however, has resulted in some unexpected and amusing 
responses from students who were asked to draw what they thought would be a 
“fast track.” 

The authors have written a Mathematica command called Race that allows 
students to explore this problem graphically. We have developed several activities 
and exercises for students with a wide variety of mathematical abilities, from 
algebra to differential equations. The student may design a path, or several paths, 
and Race will produce a plot of the paths, their lengths, times of descent and, 
optionally, an animation of beads racing down the paths. 

The exciting part is that Race enables students to experiment on their own with 
different shaped curves to gain intuition and formulate criteria for a “fast curve” 
without needing the mathematical expertise to solve the problem. Articulating 
criteria such as, 


¢ the curve should start out with a steep descent to build up velocity quickly, but 
« the steep part should not be “‘too long” or the advantage gained in increased 
acceleration will be lost in increased path length, 


requires a good understanding of slope, velocity, acceleration, and arc-length; 
fairly sophisticated stuff for pre-calculus or even high school students! 
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Line(top), length = 3.724 
Frame | Cycloid(middle), length = 4. 


#9 Circle(bottom), length = 4.432 . 
x axis 
(3.14, 2.) 
TIMES(comparison w /FastestRaced) 
Line: 0.65657, (18.5% Slower) 
Cycloid: 0.55386 (0% Fastest) 
Circle: 0.56791 (2.54% Slower) 
y axis 


Figure 1. A Mathematica simulation generated by Race 


The well known solution curve, the cycloid, makes the optimal compromise 
between steepness versus path length and is easily derived from the Euler-Lagrange 
equation. Of course, this tool is not available in most undergraduate mathematics 
classes, but that does not mean the underlying problem is inaccessible to these 
students. Instead of simply presenting the cycloid as an interesting example of a 
parametric curve, as is typically done in a 1st or 2nd semester calculus class, its 
special features can be developed—in fact, the students can discover them for 
themselves. 

Students in a beginning differential equations course can understand the 
derivation and solution of the differential equation governing the Brachistochrone. 
They are already familiar with minimization criteria (to minimize f(x), consider 
solutions to f’(x) = 0), so the Euler-Lagrange equation is not hard for them to 
swallow and should not be a deterrent from investigating this application. 

Use of Mathematica as a basis for exploring the Brachistochrone problem is a 
prime example of how technology can allow students to go beyond standard 
textbook applications and address more realistic or current applications. For 
example, a natural question to ask concerning the Brachistochrone problem is 
what happens if friction or air resistance is included in the model? This is 
discussed later in this article. The messy calculations that are typical in many 
real-world applications and (up to now) made them off limits in the undergraduate 
classroom can be handled by the computer, with possibly some surprising results to 
student and instructor alike. 

There are many applications that computer technology now makes accessible at 
all levels. One challenge to all of us as educators is to make creative use of this 
technology. The Mathematica command Race and accompanying packages along 
with a notebook of examples is available upon request. This article will conclude 
with some specific examples illustrating these exercises. 


2. PRE-CALCULUS AND CALCULUS. An engaging way to present this topic is 
to begin by testing the students’ intuition. We have broken classes up into groups 
and asked them to draw and discuss what they think is the fastest curve. In 
addition, we have created a Mathematica package that allows students to “draw” 
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their curve on the computer screen and generates a simulation from a speedier, 
scaled down version of Race (this has been used several times to provide a 
computer lab experience for 7th—9th grade students as a part of a “Math Field 
Day’” project held annually at Chico State University). We next give a ‘“‘live” 
demonstration of marbles racing down wooden ramps, roughly in the shape of a 
cycloid and a straight line. This demonstration generates a great deal of excite- 
ment, which just goes to show that the “old technology” still has its place. It also 
provides a concrete time scale which is necessary for qualifying what constitutes a 
“close” race. Even in our differential equations classes where the majority of the 
students deduce on their own that the marble rolling down the cycloidal ramp will 
beat the marble on the straight line ramp, many are surprised at the margin of 
victory. 

The focus of the presentation should be a graphical exploration of the criteria 
that makes the cycloid a fast curve, not on its derivation as the fastest curve. The 
details concerning the cycloid can be adjusted depending on the level of the class. 
We use a Mathematica animation to present the cycloid as the curve generated by 
tracing a point on a rolling circle (due to Stan Wagon at Macalester College) and 
in a Calculus course it is appropriate to derive the parametric equations. Even 
after the cycloid is presented as the solution to the Brachistochrone problem, there 
are many interesting questions to investigate that depend on physical or graphical 
intuition. For example, a traditional graphing exercise for a pre-calculus class can 
be spiced up by asking for the fastest curve among a class of familiar functions 
(especially appropriate if these functions have recently been studied) but with an 
unknown parameter. This motivates the need to graph several examples of the 
function in question so they can apply their newly gained intuition on what makes 
a fast curve. Then, they can Race their graphs to check their intuition. Test your 
intuition on the examples below. 


2.1. Finding a Fast Parabola. Find the fastest parabola that starts at the origin 
and ends at a given point, say, (3, 2) (in this article we will always take the positive 
y-axis to be oriented downward). Since there are three unknowns in a quadratic, 
we are free to impose one more condition. Let’s take the x-coordinate of the 
vertex, m, to be our unknown parameter. Below are graphs for m = 2, 2.5, 3, and 
3.5. Which of these is the fastest curve and how does this value of m compare with 
the optimal value for m giving the fastest of all such parabolas? The cycloid ending 
at the point (3, 2) is included to add some perspective. The Mathematica generated 
plot of times of descent versus m for a wide range of values of m indicates that the 
fastest of all such parabolas is found with m ~ 2.5 (not the parabola with its vertex 
at the ending point which is a popular choice—numerical minimization verifies 
that m = 2.494), 


x-axis Times 


y axis m 
Figure 2. Finding a fast parabola 
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2.2. Finding the Fastest Nth Root. This is a good family of curves for exploring 
the trade off between steepness versus path length. Below are graphs of y = 
2(x/a)'/", for n = 2, 4, and 6. The fastest of these curves is very competitive with 
the cycloid, being only slightly more than 1% slower; this would not be discernible 
with a live model. Can you tell which one it is? 


x-axis 


(17, 2) 


Figure 3. Nth roots-Steep vs Path length 


The answer is n = 2. This has always surprised our audiences, both faculty and 
students, especially when a plot of the cycloid is not included. 


3. ADDING A SMALL DOSE OF REALITY. Inevitably, any discussion about the 
importance of computer technology to mathematics will bring up the ability to 
address more realistic applications. We strongly agree and yet here we are 
presenting an application that is only valid if it takes place in a vacuum and we 
ignore frictional forces! It is vital that students don’t leave our classes with 
incorrect insight because of the setting we choose to present an application. Is this 
a critical issue for this problem? What happens if kinetic friction or air resistance 
is included in the Brachistochrone model? The Mathematica simulation allows the 
user to include a coefficient of friction to see its effect on the descent time for any 
curve. A typical coefficient of friction, to be denoted as yp, will be less than or 
equal to 0.1. A cycloid ending at the point (7, 2) with w = .1 is approximately 3% 
slower than a cycloid with no friction. 

Although kinetic friction has seemingly little effect on the travel time, the 
question still remains as to what is the effect on the shape of the fastest curve 
when friction is included in the model for the Brachistochrone problem. Is the 
cycloid still the fastest curve? If there is a new fastest curve, is it initially steeper 
than the cycloid lying below it or shallower and lying above it? Physical insight 
alone can play a key role in answering these qualitative questions. The solution 
when kinetic friction is included (this will be derived below for a simplified model 
of friction) has significantly different graphical characteristics and is an interesting 
generalization of the cycloid. 

It is helpful to begin by motivating what graphical and physical insight suggests 
will be the effect of including friction. Depending on the level of the course, the 
solution can be derived or merely presented graphically and compared to other 
curves. The frictional force is assumed to be proportional to the normal compo- 
nent of the weight of the bead and acting in the negative tangential direction (see 
Figure 4). Due to the curvature of the path, the normal component of acceleration 
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X-axis 


N (unit normal) 


(a, b) 
T (unit tangent) 


mg 
y axis 


Figure 4 


also contributes to the frictional force. We generally neglect this component of the 
friction in our initial discussions and derivations with students in a differential 
equations course. Students are more familiar with the weight component from 
studying inclined planes in physics and this is usually challenge enough. The more 
realistic solution, however, is presented graphically and can lead to interesting 
discussions as to why it differs from the cycloid or the new Brachistochrone using 
the simpler model of friction. This simplified model incorporates some interesting 
qualitative changes and has the additional advantage that the derivation of the 
solution is accessible in an introductory differential equations course since the 
equation of motion remains integrable (a more accurate treatment requires a 
constrained variational technique and, amazingly enough, it can still be solved in 
terms of elementary, albeit, messy functions—see [1]). . 

Before proceeding with a derivation of the solution, let’s develop some qualita- 
tive insight by doing a simple physical analysis, that is, let’s compare the forces 
with friction included versus no friction where we know the shape of the solution is 
a cycloid. Neglecting curvature, the magnitude of the force of friction is less at 
steep points on a curve, ranging from zero at a vertical tangent to the whole weight 
at a horizontal tangent. Since the lesson learned from the classical Brachis- 
tochrone problem, heuristically speaking, is that steepness is most important 
initially, this suggests that steepness will now be given more weight (versus path 
length) and the optimal curve, which should still have an initial vertical tangent, 
will be slightly steeper or below the cycloid (at least for the “beginning” portion of 
the curve). Since the normal component of acceleration is proportional to the 
square of the speed, one might expect just the opposite to be the case when it is 
included in the model for friction. Starting off steeper would force more curvature 
for the latter portion of the path when there is a greater velocity. 


3.1. Derivation of the Fastest Curve With Kinetic Friction. Take the starting point 
to be the origin and orient the positive y-axis downward. We seek the fastest curve 
y(x) starting at (0,0) and ending at an arbitrary point (a, b). 

If we ignore friction, then we can apply the conservation of energy, or equiva- 
lently, equate work with change in kinetic energy to obtain v = J2gy where the 
velocity v is given as ds/dt. Including friction forces us to do a line integral to find 
the work or, alternatively, we can start with the equation of motion as follows. At a 
point (x, y) on the curve, the unit tangent and normal vectors, illustrated in Figure 
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4, can be written in terms of arc-length s as, 


T ax | dy IN dy dx 
—_— —__ + _ — _ _ 8 —j. 
ds. ds° an dst ds” 


The forces of gravity and friction are given by, 


F 


gravity 


dx 
= mg J and Friction = — B(Faravity N)T = Hing — 0. 


So, the components along the curve (i.e. in the direction of T) are 
dx 


F —, 
ds 


gravity 


T “ d F T 
, = mg— an tiny = —pum 
& ds friction bmg 


Using these components in Newton’s first law gives, 


dv dy dx 1 
eas as (1) 


and substituting 


into (1) yields, after integration w.r.t. s, 


1 
zu =8y —px) or v= y2ge(y- px). 


Apply the chain rule to v = ds/dt and use the arc-length formula for ds/dx to 
solve for dx /dt as a function of x, which can be inverted to give the total time, 


1 \2 


Since the computations become quite messy, what follows will just be an outline of 
the major steps. Apply the Euler-Lagrange equation, 


d 
7 VF y') — fy = 0 
where F is the integrand in equation (2) to obtain the 2nd order differential 
equation, 
(1+ (y')))(1 + wy’) + 2¢y — wx)y” = 0. 


Through two substitutions and a partial fractions integration, this can be reduced 
to, 


1+(y’) C 
(l+py')° Yo Bx 


(3) 


for some non-negative constant C. 

Following the lead from the classical problem, the substitution y’ = cot(@/2) 
into (3) can be used to obtain a parametric solution for the optimal curve. 
Denoting the parametrization for the cycloid as, 


x.(0) =p(@—-—sin@) and y.(@) =p(1 — cos @), 
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the new fastest curve for the “frictional”? Brachistochrone problem can be given in 
the form, 


x(@) =x,(@) + wp(1 — cos 0) 


y(@) = y,(@) + we(@ + sin A). (4) 


The parameterizations in (4) and for the cycloid are valid for a range 0 < 6 < 0,, 
where p and 6, must be determined so the curves pass through the ending point 
(a, b). 

Figure 5 compares this new curve with the cycloid. Note the similar repetitive 
pattern with vertical tangents at even multiples of zp; however, the minimums do 
not occur at the same place. We have indicated a sloping line at which this new 
curve stops as opposed to making it back to the x-axis. 


217 p 39rp Arp. 


x-axis 


~ 
~— 
~~ — a 


2p Fastest Curve 


with friction 


y axis 


Figure 5. A generalization of the Cycloid 


This line is called the “line of repose” and has the minimum slope, uw, for which 
the bead will begin to slide. The solution derived above is not valid for y < px. 
This places a restriction on allowable ending points, which is consistent with our 
physical insight that, due to the loss of energy to friction, the bead can’t make it 
back to its original height. Given a valid ending point, there is a unique curve of 
this form starting at the origin with a vertical tangent. Figure 6, generated from 
Race, gives a better comparison of this ‘frictional’? Brachistochrone versus the 
classic Brachistochrone, a cycloid. Both were raced with a coefficient of friction 
yw = 0.1 and their times are compared to the cycloid without friction. 


The Brachistochrone Problem With Friction 


Cycloid(top), length = 3.648 
Fastest w/friction (bottom), length = 3.821 


X-axis 


=" oe 
~~ — a 
——— = 
~~ — 
Te ee 
~~ — = 
mn 
—-—— 
~~ — 


TIMES(comparison w/Cycloid) 
Cycloid: 0.77439 (21.7% Slower) 
Fastest: 0.7654 (20.3% Slower) 


y axts 


Figure 6. A race between Brachistrochrones with coefficient of friction, 4 = 0.1 
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For all valid ending points, the new fastest curve lies below the cycloid 
throughout its entire length, and placing the ending point on the line of repose 
gives the greatest distinction between the two curves. With a more accurate 
treatment of the frictional forces, as mentioned earlier, the fastest curve lies above 
the cycloid for its entire length (see Figure 7 with uw = 0.1) and the ending point 
must satisfy the strict inequality, y < wx, since the bead takes infinite time to 
reach the line of repose. Consistent with our earlier intuition that the normal 
component of acceleration puts a higher penalty on curvature, Figure 7 indicates 
that as the coefficient of friction increases the more realistic frictional Brachis- 
tochrone will approach a straight line. 


Brachistochrones (Realistic Friction Model) 


Cycloid 


y axis 


Figure 7. Realistic “frictional” Brachistochrones approaching a straight line 


The reader should not be misled into thinking that all of the graphical and 
physical intuition that has been developed in this article was known to the authors 
before beginning a computer-aided investigation of the Brachistochrone problem 
and variations. In fact, this is one of the main points of this article; just as the 
authors gained valuable insight through computer generated graphics and simula- 
tions, so will students at all levels increase their mathematical intuition with these 
tools and be provided the opportunity to begin exploring questions they would 
have never even thought of before. 


4. ADDITIONAL PROBLEMS. We have looked at other variations of the classic 
Brachistochrone problem. Many interesting questions can be generated by restrict- 
ing the class of admissible curves, as with the fastest parabola or nth root problems 
given earlier. One that we find particularly enjoyable is the “Two Line” Brachis- 
tochrone problem: Find the “break” point for the fastest two straight line segment 
curve joining the origin and the ending point. A similar problem has appeared 
several times in the problems section of the MAA Monthly; there the question was 
whether or not the time of travel along the two line segments were equal for the 
optimal break point (answered in E1255 [1977, 652]). A program like Mathematica 
makes it easy for students to find the break point numerically, which frees them to 
investigate other questions, such as the problem posed in the Monthly or to look 
for cases where there is a simple algorithm for finding the break point. 

Similar to the constrained variational problem with friction included in the 
model, is the question of what effect air resistance has on the fastest curve. We are 
currently attempting to solve this problem numerically so the solution can be 
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presented graphically in our classes and compared with other curves, although it 
seems likely that the effect of air resistance may be far less significant than the 
effect of friction. 

The Two Lines problem is included in our Brachistochrone notebook and an 
article concerning the use of Mathematica to investigate more realistic treatments 
of friction and air resistance is in preparation. We invite questions or discussions 
on these problems and will send completed or partially completed materials upon 
request. 
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PICTURE PUZZLE 
(from the collection of Paul Halmos) 


What conspiracy is this? 
(see page 344.) 


336 EXPLORING THE BRACHISTOCHRONE PROBLEM [April 


Continued Fractions, Chebychev 
Polynomials, and Chaos 


by William Derrick and Jack Eidswick 


1. INTRODUCTION. In this paper we uncover the phenomenon of chaos in 
continued fractions. Our definition of chaos follows that of [4] and our results 
complement those of [3]. 

We begin with a calculator/computer investigation of convergence of continued 
fractions of the following type: 


F(a) = 2 - ——,—. (1) 
2 a 


_ 


7 


In other words, we calculate the sequence of partial continued fractions t, = 2, 
t,=2-a/2,t,; =2—-—a/f2-a/2),... for a fixed value of a, and try to make 
conjectures based on those calculations. For future reference, we note that the 
sequence {t,} can be expressed iteratively by the difference equation 


a 
Ine =2— == 2 (n= 1,2,...). (2) 


n 


For arbitrary t,, (2) is a one-parameter family of discrete dynamical systems, and it 
is in this context that we will speak of chaos in Sections 6 and 7. 

If you have a calculator or computer, you can readily check that .7,(1) = 1 and 
TF (0.98) = 1.14142135624.... These answers can also be obtained by assuming 
convergence in (2) to ¢ and solving the quadratic t? = 2t — a. In general, one can 
easily prove the following result. 


If a < 1, then the continued fraction (1) converges to1 + y1—a. 


But what happens when a > 1? Using the quadratic from (2) with a = 1.02 
yields 7,(1.02) =1+ i¥2 /10, which is clearly impossible. Thus, these iterations 
do not converge. Typical t, for this continued fraction stay close to 1, but, with 
predictable regularity, gradually decrease until negative, then bounce back to a 
number greater than 2. A similar phenomenon has been studied for Newton’s 
method in [10]. 

What are the cluster points of {t,}? The graphs in Figure 1 show the sequence 
{t,} in the viewing window [0, N |X| —3.1, 3.2] and suggest that there may be many 
cluster points. On the other hand, graphs away from the line y = 1 show very few 
points, and suggest a sparsity of such points. 
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i 


i 


N = 1,000 N = 10,000 

Figure 1 

While the case a = 1.02 may seem peculiar, investigation at other values of 
a> 1 reveals similar patterns. The graphs in Figure 2 show the sequences 


{t,} = {t,(a@)} in the viewing window [0, 100]X[—3.1, 3.2] for the indicated values 
of a. 


ms ~ ne ™~. ™~\. ". ‘, ~, N. w, 


a=15 


a=1.01 a=1.1 
period = 45 


period = 30 period = 10 


a=1.9 a=2.1 =5 
period = 30 period = 35 period = 15 
Figure 2 


These graphs suggest that there may be periodic values of a in the vicinities of 
the values shown. Indeed, we will see that this is the case. 

Our goals are (i) to determine which values of a lead to periodicity and (ii) to 
describe the cluster sets for the remaining values of a. 


2. CONTINUED FRACTIONS. For a basic treatment of the subject of continued 
fractions, see [6], [8], or [9]. For a brief history and other information, see [3]. 
Observe that any continued fraction of the form 


y x (—2y/x) 
x+ y = 35/2 - 7S) 
x + y x + y 
x t+ x t+ 
xX + x t+ 
a 
= = 7)27- a (3) 
2— a 
_ 
_ 
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reduces to a continued fraction of the form (1) with a = —4y/x*. We focus only 
on continued fractions of the form (1). 


3. PERIODIC POINTS OF CONTINUED FRACTIONS. By a periodic point of 
(1), we will mean a point a such that, for some natural number n, 


t,+,(a@) =t,(a) forall k =1,2,.... (4) 


As above in (2), t,(a) denotes the kth partial continued fraction of (1). We will 
allow t,(a) to assume the value ». For instance, if a = 2, then by (2), t,(2) = 2, 
t(2) = 1, t,(2) = 0, t,(2) = ~, and t,(2) = 2; thus, a = 2 is a periodic point of 
period 4. 


Theorem 1. A number a is a periodic point of (1) if and only if a — 1 is a zero of the 
polynomial 
n—-1 
| 


Pixy= D (-1*(5,% ]e4 (5) 


k=0 


where [ | denotes the integer part of the number enclosed. 
Lemma 1. If a satisfies (4), then t,,_ (a) = 0. 


Lemma 2. If P,(x) is defined by (5), then 
P,+2(%) = 2P,+1( x) i (x + 1) P,(x) 
P(x) = 1, Pa(x) = 2. (6) 


Lemma 3. t,(a) = P,, (a — 1)/Pa — 1) forn = 1,2,... 


Proof of Theorem 1: If a is a periodic point, then a satisfies (4) for some n, and, by 
Lemma 1, t,_,(a) = 0. By back-substituting in (6), we see that not both P,(a — 1) 
and P,,_,(a — 1) can be zero. Therefore, P,(a — 1) = 0 by Lemma 3. Conversely, 
if Pa — 1) = 0, then t,_ (a) = 0 by Lemma 3, from which it follows that a is a 
periodic point. a 


Proof of Lemma 1: If (4) holds, t,, (a) = t,(a) = 2, and, by (2), t,(a) = © and 
t,_a) = 0. a 


Proof of Lemma 2: If P,(x) is defined by (5), then 


P P + y(t d 
O,(*) = Prsi(*) nx) = 2s ( yl oz] an 


[(n-1)/2] A h ; 
Qneils) ~ Q(x) =x YE (-'(o," Jobe xD 


Therefore, (P,,.(x) — P,, (x)) — (P,,.,00) — P(x) = —xP,(x), from which (6) 
follows. Z 
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Proof of Lemma 3: By (2), (6), and induction, 
t(a) =2 — ——~ = —S a 
a) = 2 7 @) P(a—1) P(a—1) 


The polynomials P(x) and Q,(x) satisfy several interesting identities. 


Theorem 2. 
Gi) P(x) = P,Co)P,_ 400) — (x + IP,_ x) P,_ (x). 
Gi) Ox) = PAW)Q, 44.100 — (4 + IP, (X)Q,_ (0). 
(iii) O2(x) — xP2(x) = Q,,(x). 
(iv) O2(x) + xPZ(x) = 1 +x). 
(v) Po,(x) = 2P,(0)Q,(x). 
Proof of Theorem 2: 
(i) For k = 2, this is (6). Then by induction, 
Pr = Py yPa-ee2 — (4 +) Pea Panes 
= P,_i| PoPy-ns1 — (4 + IPyPy_g|] — (4% + I) Pya Panes 
= Py pai | PoPp-y — (4 + 1) PyPy_2] — (4 + 1) Py_y Pax: 


(ii) By (7), (Q, _ Q,-p _ (Q,-, _ Q,,-») = —xQ,,-95 so that 
Q,=P,Q0,-,— («+ DP,Q,_>. As in @, the result follows by induction. 
(iii) Let n = 2k in Gi) to get 


Qo, = PQya1 — (4 + 1) P,Q; 


= P.(Qr41 — Q,) + Q,(2P, — (x + 1)Py-1) — P.Qy 


from which the identity follows by (6) and (7). 
(iv) From (5) and (7), it follows that O,(x) + ivx P(x) = (1 + ivx)”. 
(v) Set n = 2k in G) and use (ii) to obtain 


Pox = Pi Pray —(x+ 1) Py_;| = 2P,[2P, — (x +1)P,_; — P, | 


The identities (iii)—(v) in Theorem 2 bear a surprising resemblance to trigono- 
metric identities if we make the substitution 


This resemblance is no accident as we shall see in the next section. 

4. CHEBYCHEV POLYNOMIALS. The Chebychev polynomials of the first and 

second kinds are defined, respectively, by 
T,(cos 8) = cosn@ and 

sin(n + 1)0 


U,(cos 0) = sin 8 


forn = 1,2,.... (8) 
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From these, one may derive the following explicit representations: 


[n/2] 
Tax) = ¥ (0, )arenc? = 1)” 

“en ° 
usc) =D (AH Jaren? =) 


Chebychev polynomials, along with Jacobi polynomials, Gegenbauer polynomi- 
als, Hermite polynomials, Laguerre polynomials, and Legendre polynomials, share 
the distinction of being known as ‘“‘classical polynomials”. A great deal is known 
about these polynomials; see, e.g., [7, p. 207, 257], and [5]. For our purposes, we 
only need the above representations. 


Theorem 3. The set of periodic points of (1) is equal to 


kor 
P= f + tan? 
n+ 2 


n 
pike 12.05 /5fs0- 1,2,3,...}) 


Corollary 1. The periodic points of (1) are dense in the interval (1, ©). 


Proof of Theorem 3: By (5) and (9), x” P,, (1 — x*)/x?) = U,(x), and, therefore, 


by (8), 
sin(n + 1)@ 
cos"@: P,, (tan? 0) = sin(n ¥ 8 (10) 
sin 6 
from which it follows that the zeros of P,, (x) are x = tan*(kaA(n + 1)), k = 
1,2,...,[n/2]. a 
Proof of Corollary 1: Since the set 
{ka/(n +1): k =1,2,...,[n/2];n =1,2,...} 

is dense in (0, 7/2) and the function 1+ tan?x is continuous in (0, 7/2), the 


periodic points are dense in (1, ~). | 


5. NONPERIODIC POINTS OF CONTINUED FRACTIONS. We will now deter- 
mine the cluster sets of {t,(a@)} for nonperiodic values of a. 


Theorem 4. If a > 1 is a nonperiodic point, of (1), then the sequence {t,(a)} of 
partial continued fractions clusters at every real number. 


Lemma 4. If 0<a<q7/2 and a€{krAn+1): k =1,2,...,[n/2]; n= 
1,2,...,}, then the set {tan(na): n = 1,2,...} is dense in (— ©, ©). 


Proof of Lemma 4: The lemma is clearly equivalent to the following statement: 


If e'"* #1 for alln =1,2,..., then S = {e'"*: n = 1,2,...} is dense in the 
unit circle T. 


To prove this statement, suppose T — S contains an arc A = {e": t, < t < t,} and 


consider arcs A,= {e": 0<t<t, —t,} and A_={e~": 0<t <t, —1t,}. Since 
arcs joining successive powers of the e’® all have length equal to a, t, — ft, is 
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necessarily less than a. The arc A, can contain no point of the form e’”* because, 
if it did, we could write e’”* = e’®, where B < t, — t,, from which we would get 
that some power of e’® would be in A; i.e., (e’”*%)* = e'*"* would be in A for 
some k, contrary to our assumption. Similarly, A_ can contain no point of the 
form e’”*. In other words, T— S contains A ,UA_. If, in addition, we have 
e"* ~ 1 for n= 1,2,..., then T — S contains {e’': |t| < t, — t,}, an arc having 
twice the length of A. Repeating this process gives the existence of an arcin T— S 
which has length greater than a, contrary to the above observation. The lemma 
follows. a 


Proof of Theorem 4: By Lemma 3 and (10), 


P,, ,(tan? a sin(n + l)a tana 
t,(a) =t,(tan?a +1) = Fn tan’ a) = Sin(n + Dae =1+ 


where a = tan 'vVa — 1 > 0 and 


ad k =1,2 : 1,2 
ag n+l = 1, ou [Ehime 3 of 


Therefore, by Lemma 4, the points t,(a) are dense in (— ©, ©), | 


6. DYNAMICAL SYSTEMS TERMINOLOGY. If f is a continuous function on a 
metric space X, then by a discrete dynamical system is meant the family of 
sequences 


X, f(x), F(F(4)) 50-0 FOX), -- 


where x € X. A point x is called a periodic point of the system if f(x) = x for 
some natural number n. 

After noting the nonexistence of a uniformly accepted definition of chaos and 
quoting Bill Thurston: “to call your field ‘chaotic’ is an admission of defeat from 
the outset”, Devaney [4, p. 17] adopts the following three conditions for chaos: 


1. f has sensitive dependence on initial conditions, 
2. f is topologically transitive, 
3. the periodic points are dense in X. 


Condition 1 means that there exists a positive number 6 such that, for any 
x © X and any neighborhood U of x, there exists y € U and a natural number n 
such that dist( f(x) — f@(y)) > 6. Condition 2 means that for any pair of open 
sets U,V CX, there exists n such that f(U) NM V # ©. Conditions 1 and 2 are 
the chaotic portions of Devaney’s definition. Roughly speaking, Condition 1 says 
that if you start with any orbit, then there are other orbits that start arbitrarily 
close to it, but eventually stray from it by at least 6 units; and Condition 2 says that 
if you start with any two locations, then there is an orbit leading from one to the 
other. Condition 3 establishes an element of regularity in the definition. It is 
known that Conditions 2 and 3 imply Condition 1 (see [2]). 


7. CHAOS AND CONTINUED FRACTIONS. The results of Sections 4 and 5 are 
quite striking and suggest the presence of an underlying chaotic dynamical system. 
Below, we identify a one-parameter family of discrete dynamical systems which 
collectively set a framework for our results. In Theorem 5, we give an analysis of 
these systems according to whether ac HM or a¢ #, where # is the set 


342 CONTINUED FRACTIONS, CHEBYCHEV POLYNOMIALS, AND CHAOS [April 


described in Theorem 3. The results of Theorem 5, along with the above discus- 
sion, justify our contention that the behavior of continued fractions of the form (1) 
is chaotic. In part (2) of Theorem 5, we actually show more than what is needed to 
verify the condition of sensitive dependence. What we show is that, in a sense, the 
iterates separate exponentially. 

For a > 1, define T(x) = 2 — a/x for x € R, x # 0, and consider the following 
dynamical system: 


x, T(x), T(T(x)),-.-,T@(x),-.- (11) 


Theorem 5. (1) If a € F&, then (i) T does not have sensitive dependence on initial 
conditions, (ii) T is not topologically transitive, and (iii) every real number x # 0 is a 
periodic point. 

(2) If a & F, then the situation is reversed: (i) T has sensitive dependence on 
initial conditions, (ii) T is topologically transitive, and (iii) no real number x # O is a 
periodic point. 


The following lemma is an easy generalization of Lemma 3. 


Lemma 5. T(x) = [xP,,,(a - 1) - aPa - DIAP,. (a -D+ (x - 2)Pla - 
1)] forn = 1,2,3,.... 


Proof of Theorem 5, part (1): Assume a € # with period n, so that P,(a — 1) = 0 
and P,(a — 1) #0 for k =1,2,...,n —1. Define A, = P,, (a — 1)/P,(a — 1) 
for k = 1,2,...,n — 1. Then, by Lemma 5, 
XA, — a 
T(x) = —————— fork = 1,2,...,n —1, and 
A, + (x — 2) 
T(x) =x, 


from which (iii) follows. 

Also, for 6 > 0, choose x # 2 — A, for k = 1,2,...,n — 1, and e > 0 so small 
that |T(x) — T“(y)| < 6 whenever |x — y| <« and k = 1,2,...,n (which is 
possible by continuity of T™ at x). Condition (i) follows by periodicity. 


Let 6=1. Then for x and «e as above, |x —y|<e, and m =1,2,..., the 
sequence {T°(y)} is bounded by Max{|T(x)| + 1: k =1,2,...,n}; hence, 
(ii) follows. | 


Lemma 6. If a € F, there exists a sequence {A,}, dense in R, such that 
xr’, —a 


n 


OO) = eed 


Proof of Lemma 6: Define a = tan~'vya — 1. By Lemma 5 and the proof of 
Theorem 4, T(x) has the desired form with 


P,,,.,(tan? a) tan a 
A, = oooomoC= 1 t+ 
P,(tan* a) tan(na) 
The result then follows from Lemma 4. | | 


Proof of Theorem 5, part (2): Assume a ¢ #. If |x| > 3a, then |Tx| < 3 < 3a, so 
every sequence {Tx} intersects the interval J = [—3a,3a]. If we can show 


1995] CONTINUED FRACTIONS, CHEBYCHEV POLYNOMIALS, AND CHAOS 343 


sequences beginning in J have sensitive dependence on initial conditions, then all 
sequences will have sensitive dependence as they intersect J. Let 0<e< 
min(1,(a — 1)/5). Since {A,} is dense in R, there exists N = N(e) such that for 
every x in J, we can find A, such that «/4 < |x —-(2—-A,)| <«/2 and n <N. 
Select any y such that ¢/16 < |x — y| < &/8, then 


Ix —yl[(, — 1)? + (a- | | (/16)(a = 1) 
Jx-—(2-A,)||y - (2-A,)| (€/2)(5e/8) 
a-1l 
SE 
=e"lx — yl, 


IT (x) — T(y)| 


€ 
>1> 85 > ell xy — y] > eV Say — y| 


where a = (In8)/N > 0. This gives exponential divergence since the Lyapunov 
exponent a is positive. See [1, p. 85]. 

Also, from Lemma 6 and the fact that x(2 — x) — a # 0, we see that {T“(x)} is 
dense in R for any x and, therefore, (ii) and (iii) follow. | 


Remark. If a is complex and Im a + 0, it is not hard to show that the continued 
fraction (1) converges. Thus, chaos occurs only when a > 1. 
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Answer to Picture Puzzle 
(p. 336) 


R. D. Anderson, R. H. Bing, and R. S. Palais 
at a meeting of the Council of the AMS (in 
Toronto, August 1967). 
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A Relation Between Partitions and 
the Number of Divisors 


Wang Zheng Bing, Robert Fokkink and Wan Fokkink 


A sum of positive natural numbers adding up to n is called a partition of n. For 
instance, 1 + 2 + 4 is a partition of 7. As none of the summands 1, 2,4 are equal, 
this is called a partition into unequal parts. There are five partitions of 7 into 
unequal parts: 


1+2+4, 14+6, 2+5, 344, 7. 


Since the partitions 1 + 2 + 4 and 7 contain an odd number of summands, they 
are called odd partitions, whereas the other three partitions are called even. Add 
the smallest numbers of the odd partitions, 1 + 7 = 8, and do the same for the 
smallest numbers of the even partitions, 1 + 2 + 3 = 6. The difference between 
these two sums, 8 — 6 = 2, is exactly the number of divisors of the prime 7. 

In the sequel, p(n) denotes the sum of the smallest numbers of odd partitions 
of n minus the smallest numbers of even partitions of n, and d(n) denotes the 
number of divisors of n. For small numbers n, it is easy to check that p(n) equals 
d(n). This is not a coincidence; we shall see that it is a general relation between 
the smallest numbers of partitions into unequal parts and the number of divisors. 


Theorem. p(n) = d(n) for all positive natural numbers n. 


In order to prove this theorem, we introduce the sum of polynomial quotients 


n—1 (1 —- xX) — X't?) a (1 — X") 


PAX) = bo = 
i=0 1-X 


for positive natural numbers n. At each consecutive quotient, the degree of the 
denominator decreases by one and the leftmost factor in the numerator drops out. 
Fix an m = 1,...,n. We shall show that the coefficient a,, for X” in P(X) 
equals d(m) — p(m). 

First, we determine the contributions from the separate quotients of P(X) to 
a,,. Fix an i = 0,...,n — 1, and replace the denominator 1/(1 — X”~‘) in the ith 
quotient of P,CX) by its power series (which converges for |X| < 1). Hence, the 
ith quotient of P,_X) takes the form 


(1—-X'*") +) (1 -X")(L 4X7 T+ XO 4 ee), 


Since m <n, the contributions from this product to a,, stem either from (1 — 
X'*!)---( —X") or from (1 + X77 +X") + +--+). Now, we collect the 
contributions to a,, of these two types of terms. 
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1. Clearly, the series (1 + X"7' + X*%"—-9 + ---) contributes +1 to the coeffi- 
cient a,, of X” if and only if n — i is a divisor of m. 
As 1 increases from 0 to n — 1, the number n — i decreases from n to 1. In 
this range there are d(m) numbers which divide m, so there are d(m) series 
1 +x" 4 X%"-9O 4 ---) for i =0,...,2 —1 which contribute +1 to 
a,,- Lhese contributions together sum up to d(m). 


2. If we decompose the product (1 — X'*!)--- (1 — X”), this results into terms 

(—1)'x*1+ +* for all sequences of numbers i+1<k,< ++: <k,<n. 
So this product contributes +1 to a,, for each even partition of m with 
terms greater than i, and it contributes —1 to a,, for each odd partition of 
m with terms greater than i. 
So for each even partition of m with smallest term k, the products (1 — 
X'*!).--(1 —X") for i= 0,...,k — 1 contribute +1 to a,,. These contri- 
butions together sum up to k. Even so, for each odd partition of m with 
smallest term k, the products (1 — X'*!)---Q@ —X™") for i=0,...,k -—1 
contribute —1 to a,,. These contributions together sum up to —k. So in 
total, these contributions to a,, sum up to —p(m). 


Hence, we have found that a,, equals d(m) — p(m) for m = 1,...,n. So to prove 
our main theorem, it suffices to prove the following proposition. 


Proposition. P(X) equals n for all n > 1. 


Proof: Note that P,(X) = 1. To prove the proposition, we show that the differ- 
ence between P,, (X) and P,CX) is equal to 1. 
Shifting the index i of the sum P(X) by one, P,, (X) — P,CX) takes the form 


(1 — X'*1) tee (1 — x"*!) 7 


(1 - X') ++ (1 - X") 
17—-xX"ti- ° 


_ Xntri-i 


n 
yy 
i=0 


iM: 


l 


In the second sum, we can also start the index i at 0, because its quotient for i = 0 
equals zero. Now, collecting quotients of equal denominator gives 


n . (1 —X'*")---(1-X”) 
y) (X'- x") ——— 
i=0 1 — Xx " 


The denominator 1 — X”*!~ divides the factor X¥' — X"*! in the numerator, so 
this sum equals 


rx — Xitl) 6. (1 — Xx"), 
i=0 


Denote this polynomial by Q,(X). The result will follow if Q,(x) = 1 for all n. 
Again, we use induction. Ox) = (1 — X) + X = 1, and isolating the term with 
i=n-+1 in the expression for Q,,,(x) yields the relation Q,, (x) = (1 —- 
X"*1)0 (x) + X"*!, which provides the inductive step. a 
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Answers to Two Questions Concerning 
Quotients of Primes 


Paolo Starni 


We will consider some open questions concerning quotients of primes posed in [1] 
by D. Hobby and D. M. Silberger. 

Corollary 3 of this note will solve Open Problem 2 of their paper, and our 
Theorem 1 represents some progress on their Open Problem 1. R* and N denote, 
as usual, the set of all positive real numbers and the set of all positive integers (0 
excluded). If S CN, we indicate with F(S) the set of all quotients p/q for which 
{p,q} CS and p ¥ gq; so, for instance, F(N) = Q*—{1}, where Q* represents the 
set of all positive rational numbers (0 excluded). 

Open Problem One is: 


Characterize the family of all S CN, S infinite, for which F(S) in dense 
in R*. 


We note that, by the definition of F(S), density in JO, 1] implies density in R*. It is 
convenient to assume here the following definition of density (see [2] passim): Let 
X CY CR". The set X is dense in Y if and only if for every y € Y there exists a 
sequence x, in X such that lim, ,,.%, =y. 

The following theorem gives a sufficient condition for F(S) to be dense. 


Theorem 1. /f there exists a strictly increasing sequence in S, p,, such that 
lim,» +0D2n—1/Pn = 1, then F(S) is dense in R*. 
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Proof: Let x €]0, 1] be given, and choose k in such a way that xp, > p,. For any 
n > k, we may choose m so that p,,_, < xp, <D,,, making m a function of n. We 
then have 


O< (Pin — XPn) /Pn < (Dm — Pm—1)/Pn < (Dn — Pm—1)/Pm =1 — Pm—1/Pm: 


When n goes to positive infinity, 7 goes to positive infinity as well, so we have 
O< lim (p,,—xp,)/P,<1- lim (p,_i/p,) =1-1=0. 
n-+o ma+0 
Thus 


lim ((Pm/Pn) ~*)=Oandso lim p,,/p, =x. a 
n- +o n> +0 


From Theorem 1 we deduce the density of F(S) in R* if S = {a + nb;a,b EN, 
n= 1,2,...}. Since the set of all positive integers odd (Q) or even (E) may be 
considered as an arithmetic progression, we obtain that F(O) and F(E) are both 
dense in R™. So also is F(N), but in this case the result is trivial since it is well 
known that Q* (or Q*—{1}) is dense in R*. Besides, we obtain that: 


Corollary 2. F(P) is dense in R*, where P denotes the set of all primes. 
(This corollary represents another proof of Theorem 4 in [1].) 


Proof: Let p, be the sequence of all primes in increasing order. The Prime 
Number Theorem (cited also in [1]) implies that asymptotically p, = n log n (for 
the proof see [3], p. 10) and so lim, ,,. D,_,/p, = 1. The proof now follows from 
Theorem 1. a 


D(a, b) denotes the set of all primes which belong to the arithmetic progression 
{a + nb;a,b € N, n = 1,2,...}. If a and b are coprime, then D(a, b) is infinite 
(this is Dirichlet’s theorem cited also in [1)). 


Corollary 3. F(D(a, b)) is dense in R*, whenever a and b are coprime. 


(This statement answers affirmatively the question posed in [1] as Open Problem 
Two.) 


Proof: Let (x) be the number of primes <x. When a and b are coprime we 
indicate by 7, ,(x) the number of primes in the set {p,; n © N} = D(a, b) that do 
not exceed x. We have that 7, ,(x) is asymptotically equal to 7(x)/(b), where ¢ 
is Euler’s totient function (see [4], p. 214). Also in this case by the Prime Number 
Theorem one obtains that asymptotically p, = n log n and the proof follows as in 
Corollary 2. a 


Theorem 4. The converse of Theorem 1 is false. 


Proof: Let S = U%_,S,, where S, = {27",2°" + 1,...,2°"*! — 1}. F(S) is dense 
in R*, but lim, ,,..p,_,/p,) # 1 for any strictly increasing sequence p, from S. 
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1) To see that F(S) is dense let x ~ S mean that x can be approximated 
arbitrarily closely by p,/q, with p,, q, € S and p, #q,. We have the following 
points. 

(i) If x ~ S, then 4x, x/4 ~ S. 

Gi) If x € (1/4, 1/2] or x € [1/2, 1], then x ~ S. We use the dyadic expansion 
x = L*_(a,/2j), a; € (0, 1). 

If x €[1/4,1/2], then x = 7 + L?_,(a,/2j); we must show that x, = 4 + 
v4" ,(a;/2j) belongs, for n > 2, to F(S). We obtain 

2M" D + (asd PF + ++ +ay,) 2H V4 
rr a ae 
SinceO<s<14+24+-+- +273 = 27-27-14 .<27""2 x © F(S). 
If x © [1/2, 1], then x = 3 + Li_,(a,/2j); when n = 1 


Lett a, 2° +a,2°™ + + +Oy,4, 27" +5, 
n 2 j= 2j enti Jentl 


j_ 


be 
I 
| 
| 
I 


where 0 <5, < 277 14+ +++ 4+1=27"—-—1 < 2"; thus 


J2n 4 S, Jentl 


a Jentl — | =~ Xp Jentl —] 


(iii) if x €]0, 1], then x ~ S. 

If x € [1/2°?,1/27?~'] for p = 2, then 2%?-Yx €]1/4, 1/2] and x ~ S by (1). 
If x €[1/27?*!,1/2°?] for p > 2, then 27?x €]1/2,1] and x ~ S by (1). 

As we noted at the start of this proof, (iii) implies that FCS) is dense. 


=x and y, € F(S) 


2) For the other half of the proof, note that if p,q € S, p <q, then, setting 
p =27* +s and q=27" +t, we get k <h. If k <h, then 
2-D45 1/44+52-7% | 1 

< — << rH EM SK. 
p/q 27h +1 14+ 14277" 4 

Now let p,, be a strictly increasing sequence in S. 
We assume that lim, ,,.(p,_,/p,) = 1; then it must be that for n sufficiently 
large, 5 < (p,_1/p,,) < 1. So k = h is constant and that is impossible. a 


NO 
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Avoiding the Exchange Lemma 


James Ford 


In finite-dimensional vector space theory, before defining dimension (as the size of 
a basis) it is necessary to show that all bases have the same number of members. 
This is usually done by appealing either to the exchange lemma or to the fact that 
a system of more than nm homogeneous linear equations in n unknowns has a 
nontrivial solution. When the underlying field has zero characteristic, however, the 
following more direct approach is available. 

Let V be a vector space over a field of characteristic zero. A set {u,,u,,...,u,} 
of vectors is a basis for V if each v € V can be written uniquely in the form 
v = L_,A,u;, where the A, are scalars. 


oe © 


Lemma. All bases for V have the same number of members. 


Proof: Let {u,,uz,...,u,} and {v,,v,...,U,,} be bases for V. Then there exist 


unique scalars a,,,b;, (1 <i <n, 1 <j <m) such that 


for each 7 and j. It follows that for 1 <i <n 


m m a 
u; = > a; ;U; = » aij > Dj Uy 
j=l j=1 k=l 


1.e. 
n mm 
u= Lb A ;Dj,Uy. 
k=1j=1 
Since {u,,Uz,...,U,} is a basis, the two expressions above for u; must be the same. 


Equate the coefficients of u,; on both sides: 


and sum over I: 


Similarly, 


Since the two double sums are rearrangements of each other, it follows that 
m =n. 
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Remarks 


1. The proof consists of the observation that Tr(AB) = Tr( BA), where A = 
[a;,] and B = [b,,]. Note that if the field had nonzero characteristic then the 
argument would merely establish that the sizes of the two bases differed by a 
multiple of the characteristic. 

2. The same argument can be used in an obvious way to define the degree of a 
finite extension of a field of characteristic zero. This leads directly to a proof 
of the double extension theorem: if FE is a field of characteristic zero, F is a 
finite extension of F and G is a finite extension of F, then G is a finite 
extension of E and [G: E] =[G: FF: E]. This approach can be used to 
demonstrate the impossibility of the duplication of the cube and the trisec- 
tion of the angle using straightedge and compass to students who have no 
knowledge of linear algebra. 
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Intervals Contained in Arithmetic 
Combinations of Sets 


Stephen Silverman 


If A and B are subsets of R, then the set A + B is the set of all sums a + b for 
a€Aandb €B.Thesets A — B, A: B, and A/B are defined similarly (with no 
dividing by zero). 


Theorem (Steinhaus). Jf A and B have positive measure then A + B and A —-B 
each contain an interval of positive length. 


See, for example, [2, p. 143]. This result of Steinhaus has been generalized, by 
Weil, to locally compact groups ([1, p. 296]). For the topological group (R \ {0}, - ) 
the invariant (Haar) measure of a set A is given by f lxlo* dx, and thus this 
measure has the same sets of positive measure as does Lebesgue measure. Since 
an open set in R \ {0} is open in R, Steinhaus’s theorem could be augmented to 
include the sets A -B and A/B in its conclusion as well. 

The conclusion doesn’t necessarily hold if the sets have measure zero, even if 
they are closed and uncountable. The existence of closed, uncountable, and 
independent sets is shown in [4, p. 103], where A is independent if whenever x, 
are distinct elements of A and n, are in Z, then n,x, + °-: +n,x, = 0 implies 
that each n, is 0. So if A is independent and A + A contains an interval and c isa 
nonzero member of A, then there is a rational p/q with p odd and gq even so that 
(p/q)c isin A + A. This means there are a and b in A with a + b = (p/gq)c or 
ga + qb — pc = 0 which contradicts independence. 


1995] NOTES 351 


However in the case that A and B are the Cantor set C ([2, p. 71]) we have 
C+C = [0,2] and C — C =[-1,1]. This can be seen by adding and subtracting 
members of C in base 3 recalling that C is the set of all reals in [0, 1] with only 0’s 
and 2’s in their ternary representations. 

A fascinating observation by Ray Mayer [3] is that C - C, though not all of [0, 1] 
(e.g., 2/5 is missing), does contain intervals, has measure about .80955, and has 
countable boundary. The first two facts are not too difficult and make nice Cantor 
set exercises. 

The purpose of this note is to point out that there is a well known class of sets 
(whether measure zero or not) that have the maximum possible sum, difference, 
product, and quotient. Our result, which is a further extension of Steinhaus’s 
theorem and which seems to have gone unnoticed, concerns the “arithmetic” of 
dense G;, sets. 

Recall that A has measure 0 if for each n > 0 we can find an open set U, 
containing A with total length less than 1/n. The intersection of the U, is then a 
G, (the countable intersection of open sets) of measure 0. If A also contains all 
rational numbers, then it is a dense G,; of measure 0. 


Theorem. Let G and H be dense G; sets in non-empty open intervals I and J 
respectively. If & is any one of the four arithmetic operations +, —,- or /, then 


G&H=I1&/J. 


except that in the case of multiplication and division 0 might be in I & J but not in 
G &H. 


Proof: We consider the simplest case first: J = J = R and &= +. Let r be inR 
and f(x) =r-—x. Since f is a homeomorphism, f(H) is a dense G; set. By the 
Baire Category Theorem [2, p. 68] f(H) is of the 2nd category and thus cannot lie 
in R \ G which is of first category. This implies that there is an x © H such that 
f(x) is in G, but f(x) +x =r so r & G finishing this case. 

For arbitrary J and J let r be in 7+ J. Then the set X=IN(r—J) is a 
non-empty open interval with f(H) MN X and GMX dense G; sets in X, and we 
can proceed as above. When the operation is subtraction let f(x) =x +r, for 
multiplication let f(x) = r/x, and for division let f(x) = rx, with r # 0 in the 
latter two cases, insuring that f is still a homeomorphism. a 


One might wonder whether the hypothesis of the theorem can be weakened. 
Consider the following example, which is an application of the above Theorem. 


Example. There exists a set A of 2nd category such that A+ A contains no 
interval. 


In fact we will show the existence (using the axiom of choice) of an independent 
set of 2nd category. 

First, we observe that the cardinality of any dense G;,G is c. If card(G) = z, 
then z <card(G + G) < card(G X G) =z-z=z, but by the above theorem 
G+G contains an interval so z = c. Second, we note that there are exactly c 
dense G,’s. Since there are c open sets and each G;, is determined by a countable 
sequence of open sets there can be at most c G,’s, but for each r © R, R \ {r} is a 
dense G;, so there are exactly c dense G,’s. 
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Now let w, be the least ordinal with c predecessors and let {G,} with a < w, be 
the collection of all dense G,’s. Let g, be in G,,g, #0, and suppose for 
a <B <wo, we have g, in G,. Think of R as a vector space over the rationals and 
choose g, in Gz, with the condition that g, is not in span{g,: a < B}. This can be 
done since 


card(span{g,: a < B}) <c = card(G,). 


The set A = {g,: a < w,} is clearly independent. To see that it is of 2nd category 
let’s assume that it is of 1st category. A is therefore contained in a set F equal to 
the countable union of closed nowhere dense sets, hence the complement of F is a 
dense G;, say G,. Thus g, is in G, and this contradiction concludes the 
argument. 


Remarks: 


1. Is there a set A such that A +A contains an interval but A -A does not? 

2. We get the same result for & = + or — fora locally compact group (G, +) 
since the category theorem is valid in locally compact Hausdorff spaces. 

3. Does the image of a dense G; under a continuous function that is not 
constant on any interval contain a dense G,; on some interval? 

4. I would like to thank Joe Buhler for his valuable input. 
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UNSOLVED PROBLEMS 


Edited by: Richard Guy and Richard Nowakowski 


In this department the MONTHLY presents easily stated unsolved problems dealing 
with notions ordinarily encountered in undergraduate mathematics. Each problem 
should be accompanied by relevant references (if any are known to the author) and by 


a brief description of known partial or related results. Typescripts should be sent to 
Richard Guy, Department of Mathematics & Statistics, The University of Calgary, 
Alberta, Canada T2N 1N4. 


Does the Mobius Function Determine 
Multiplicative Arithmetic? 


D. Flath and A. Zulauf 


Is the multiplication law on the positive integers uniquely determined by the values 
of the Mobius function and the property that multiplication respects order? 

Let us be more precise. The Mobius function pw is defined on positive integers 
by the rule that u(n) equals 0 if n is divisible by the square of a prime and p(n) 
equals (—1)’ if n is the product of r distinct primes. We define yw on free Abelian 
semigroups analogously, regarding the generators as primes. 


MOBIUS PROBLEM. Suppose that A = {a,,a5,a3,...} is a free Abelian semi- 
group, where a, is the unit element. Do the following two properties imply that 
Ana, = 4,,, for every m and n? 


1. a<b implies ac < be for a,b,c € A, where A has been given the linear 
order a, <a, <az,< °°:. 
2. pla,) = wn) for every n. 


The best way to understand the problem is to think of a, as n and use the 
Mobius function and the order property to factor the first few positive integers, as 
follows. 

To begin with, 1 is the unit. 

Since u(2) = —1, 2 must be the product of an odd number of distinct primes. 
Since there are not three primes less than 2, 2 must be prime. 

Similarly, u4(3) = —1 implies that 3 is prime. 

Next, (4) = 0, so 4 is the smallest number divisible by the square of a prime; it 
must be the square of the smallest prime. Hence 4 = 27. 

Next, (5) = —1 and since there are only two primes smaller than 5, 5 must be 
prime. 

Since, w(6) = 1, 6 is the smallest number that is the product of distinct primes, 
so it must be the product of the smallest two. Thus 6 = 2 - 3. 
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Next, u(7) = —1. Can 7 be the product 2 - 3 - 5? No, because 2:3°5>3:°-5 
and no number less than 7 equals 3 - 5. Therefore 7 is prime. 

Next, (8) = u(9) = 0. A little logic shows that 8 and 9 must equal 2° and 37, 
but there seems to be no way to determine which is which. 

Go on to 10. Since (10) = 1, it is easy to see that 10 = 2-5. 

And so it goes. You can show that 11 is prime, that 12 = 2? - 3, and that 13 is 
prime quite easily. But 14 and 15 are 2 - 7 and 3 - 5 and there seems to be no way 
to tell which is which. 

The further you go the more intricate the logic becomes. Some numbers factor 
easily, but more and more of them present problems. Let’s skip ahead and see how 
the uncertainties about 8, 9, 14, and 15 can be resolved. 

Let Z(n) = #{m <n: w(m) = 0}. Then 

Z(2-3°5) = #{2°-a,37-b,5°:1<a<rj,1<b <3} =r+4=100r11, 
where r = 6 or 7 according as2:7> or <3°:5. But Z(n) = 10 implies that 
n = 27, and u(27) = 0 4 w(2- 3-5). Hence Z(2:3-5)=11, r=7,2°3-°5> 
29, and 2-7< 3-5. Therefore 14 = 2-7 and 15 =3-°-5. 

Since 3-11 >2-3-52>29 and w(3-11)=1, we have 3-11 > 33. Since 
27-37 >3-11> 33 and p(2?- 37) = 0, we have 2” - 3? > 36, and hence that 
Z(2? - 3*) = Z(36) = 13. But 

Z(2? + 3°) = #{2° -a,37-b,5*:1<a<s,1<b <3} =5+4, 
where s = 8 or 9 according as 2° > 37 or < 37. Therefore s = 9, and so 2° = 8 < 
37 = 9, 

One way to interpret what happens is that any confusion about two elements, 
say m and n, propagates and leads to confusion about m-k and n-k for 
k = 2,3,.... Eventually m:-k and n-k are sufficiently far apart for the confu- 
sion between them, and hence for that between m and n, to be resolvable. For 
instance, the confusion between 8 and 9 is resolved since 8-4 and 9- 4 can be 
shown to be separated by 3 - 11. Going to higher numbers resolves uncertainties 
about factorizations again and again, but you seem to have to go farther and 
farther out to do it. Are all factorizations ultimately determined in this way? It is a 
question of just how much information is contained in the Mobius function, taken 
as an oracle. 

The Mobius problem arose in 1979 from discussions between A. Zulauf and his 
doctoral student P. B. Braun. It was verified at the time that a,,, = a,,a,, for all 
mn < 74 if pwla,) = wn) for all n < 240, but the proof is lengthy and this 
comparatively insignificant result did not seem worth publishing. Meanwhile, P. B. 
Braun proposed and investigated a much stroriger conjecture. 


BRAUN’S CONJECTURE. Let B = {b,, b,, b3,...} be an infinite Abelian semi- 
group with the linear order b, < b, <b, < ... such that b, is the unit element 
and a <b implies ac < bc for a,b,c € B. Define the Mobius function w on B 
inductively by 


1. plb,)=1 
2. Lb ib, Mba) = 0 for n= 2, 3, 4, eee 


Suppose that u(b,) = w(n) for all n = 1. Then b,,, = b,,b, for all m,n > 1. 
The point here is that uniqueness of factorization is not assumed and, accord- 


ingly, the Mobius function is defined in a way that does not depend on each Db, 
being a unique product of powers of generators. Of course, if Braun’s conjecture is 
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true then it follows that the fundamental theorem of arithmetic does in fact apply 
to B. 

Here is a quick sketch of the way things work out for the first few positive 
integers. Considering the Mobius condition alt, Aa) = 0 for n = 2,3,4,... in 
turn, and bearing in mind the ordering condition, one can build a tree of all 
possible factorizations on which each path from the root exhibits a string of 


factorizations of b,,b,,b3,...,b, that meets all requirements. At n = 14 there 
will be four free branches, namely, writing n instead of b, for brevity, 
1 
2 
3 
= 22 
5 
6=2:°-3 
8 = 2° 8 = 37 
9 = 3? 9=2° 
10=2°-5 10=2°-5 
11 11 
12 =2°-3 12 = 27-3 
13 \ va 
14=2:-7 14=3:°-5 14=2:-7 14=3:°-5 


At n = 21 each free branch splits in two: 21 = 3-7 or 2:11. At n = 22 two of 
the eight free branches split in two, and it is at this stage that the possibility of 
non-unique factorization first occurs: there are two paths that have 22 = 3° = 5? 
and, along the way, 8 = 37, 9=2°, 20 = 27-5, 21 =3-7. But in this case 
77>5-6=3-10>3-9=2- 12, which would imply that 2 - 11 is less than all 
other composite elements exceeding 22, that 2 - 11 has unique factorization, and 
that w(2- 11) = —w) — pw) — wD = 1 by the Mobius condition. But n > 22 
and pu(n) = 1 imply that n > 26. Hence 24 would be less than 2 - 11 and therefore 
irreducible, in contradiction of (24) = 0. The possibility 22 = 3° = 5? is thus 
eliminated, and the number of free branches at n = 22 reduces from ten to eight. 
More branches sprout at m = 24, 25 and 27, but at nm = 28, the possibility 
14 = 3-5 is eliminated, and at n = 32 the possibility 8 = 3° is eliminated. This 
greatly reduces the number of free branches, and it has now been established that 
bn = O,,b, for all mn < 20 if w(b,) = wn) for all n < 32. The uncertainty about 
21 (and 22) is removed at n = 46, but by then many new uncertainties have been 
encountered, and one can but hope that all uncertainties are eventually resolved. 
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PROBLEMS AND SOLUTIONS 


Edited by: 
Richard T. Bumby, Fred Kochman and Douglas B. West 


PROBLEMS 


10445. Proposed by Alan J. Gross, Medical University of South Carolina, Charleston, SC, and 
Hong Zhang, Indiana-Purdue University, Fort Wayne, IN. 


Note that 5*-+5+2 = 2°. Are there any other positive integers a and b witha’ +a+b = 
b°? 


10446. Proposed by Hubert Kiechle, Technische Universitat, Munich, Germany. 


Let T = {z : |z| = 1} be the unit circle in the complex plane, and let w be a given nonzero 
complex number. 


(a) If |w| < 2, show that there are unique Z;, Z2 € T such that w = z; + Z2. 
(b) If |w| > 2, show that w can be written as a sum of [ | w| | elements of T. 
(c) Under what conditions will w be a unique sum of n elements of T. 
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10447. Proposed by Stephen C. Locke, Florida Atlantic University, Boca Raton, FL. 


Consider a tournament in which every pair of teams play a match in which one of the two 
wins. Let Lo be a listing of the teams in some order, and define successive L;,i = 1, 2,3,... 
by repeated application of the following operation: if a team in the list L; lost to the team 
immediately following it in the list, call that pair of teams a switchable pair; the order of 
one switchable pair is then reversed to give Lj;1. Note that this may increase the number 
of switchable pairs. 

Prove that any such sequence of operations leads, in a finite number of steps, to a list 
in which every team defeated the team immediately following it in the list, so there are no 
switchable pairs. 


10448. Proposed by Fu-Chuen Chang, National Sun Yat-sen University, Kaohsiung, Taiwan. 


Fix a positive integer n. Let x; = cos (2p) forl <i <n,andq = Ly xh for 
k € N. Show that 
| (: k=1,3,...,2n—1 
Ck = —k 
* (x 72)2 k=0,2,...,2n —2. 


10449. Proposed by Frank Schmidt, Arlington, VA. 


For which n can the symmetric group S, be generated by two conjugate permutations? 


10450. Proposed by Kenneth S. Williams, Carleton University, Ottawa, Ontario, Canada and 
Blair K. Spearman, Okanagan University College, Kelowna, B. C., Canada. 


Let K be a quartic extension of the field Q of rational numbers. K is called a pure 
extension of Q if there is an integer / such that K = Q( V1), and K is called a bicyclic 
extension of Q if there exist integers m and n such that K = Q(./m, ./n). Determine all 
quartic extensions that are both pure and bicyclic. 


10451. Proposed by Joaquin Gémez Rey, I. B. “Luis Buruel”, Alcorcén (Madrid), Spain. 


In the story below, m, n and r are integers withO <r<m <n. 
“Once upon a time, there lived a miserly king who had m gold coins and n silver ones. 
One day, he put n + r coins in his right pocket, and the remaining m — r coins in his 
left pocket. For the rest of his life, it was his pleasure, once each day, to take a coin 
at random from each pocket, privately admire the two coins, and then return each to 
the opposite pocket. In other ways, he was a good king and lived happily in his castle 
for many years. In all those years, no one ever knew how many gold coins were in 
his left pocket on any particular day.” 

Determine the most likely number of gold coins in his left pocket in the long run. 


NOTES 


(10447) This problem gives an alternative proof of Redei’s theorem that every tournament 
has a Hamiltonian path. (10448) The x; are the roots of the Chebyshev polynomial T,, (x). See 
Theodore J. Rivlin, Chebyshev Polynomials, Wiley, 1990, for more information about these 
polynomials. (10451) The process described here is an example of a “Polya Urn Scheme”. 
It is related to the use of these processes to model diffusion. In W. Feller, Introduction 
to Probability Theory, vol. I, two such models are described. The process described here 
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is More similar to the Bernouilli-Laplace model, than to the Ehrenfest model, but Feller’s 
exposition of both models may be helpful. This material is found in many places in the 
book, but may be easily located by using the index. 


SOLUTIONS 


Finitely Many Primes in Every Translate 


10208 [1992, 266]. Proposed by Solomon W. Golomb, University of Southern California, Los 
Angeles, CA. 


Let 1 < a, < az < a3 < ... be an increasing sequence of positive integers. 

(a) Is there such a sequence {a,} having the property that, for all integers n (positive, negative, 
or zero), {ax +n} contains only finitely many primes? 

(b) Is there such a sequence {ax} and a constant B > 0 having the property that {ax + n} 
contains no more than B primes for every integer n? 


Solution (part a only) by Kevin Ford (student), University of Illinois, Urbana IL. The answer 
to part (a) is yes. Take ay = ((2k)!)!+k!. If |n| > 2 andk > |n|, then ay > 2|n| and |n| is 
a proper divisor of a, +n. Hence, {ax + n} contains at most |n| — 1 primes when |n| > 2. 
Each of the sequences {ax} and {ax + 1} contain at most one prime since, for k > 2, k is 
a proper divisor of a, and k! + 1 is a proper divisor of a, + 1. If k > 3, then k! — lisa 
proper divisor of a, — 1 and thus {ax — 1} contains at most two primes. 


Editorial comment. Several readers observed that the fact that no answer was known to 
(b) demonstrated that the answer to part (a) was “yes”, but these solvers also went on to 
provide examples of sequences with this property. No solution to (b) was found. Kevin 
Ford, Robert High, Gerry Myerson, and the proposer all noted that a negative answer to 
(b) would result if the well-known “Prime k-tuples Conjecture” were true. This conjecture 
(given as A9 in Richard K. Guy, Unsolved Problems in Number Theory, Springer-Verlag, 
1981, p. 15) states that if {k;} is a finite set of integers and for every prime p the set of 
residues mod p do not form a complete residue system, then, for infinitely many values 
of n, the set {k; +n} consists entirely of primes. Following is Kevin Ford’s proof that the 
Prime k-tuples Conjecture implies there does not exist a sequence {ax} and B > Q with the 
desired properties. 

First observe that the non-existence of such a sequence is equivalent to f{a,;(L) — ©o 
as L — ox, for every increasing sequence of positive integers, where 


f(a,}(L) = max(number of primes in {a} +7,...,ayp +n}). 
n 


For a sequence {a,x} and integer L > 1, set A, = {aj,..., a,}. Some residue class modulo 
2 contains at most L/2 elements of Ay. Remove those from the set. Some residue class 
modulo 3 contains at most 1/3 of the remaining elements. Remove these from the set. 
Continuing this process of removing one residue class modulo q for all primes gq < L, we 
obtain a subset of Az, say {Z1,..., Zx}, having the property that for every prime p, there is 
an integer m such that z; 4 m (mod p), for every i. The Prime k-tuples Conjecture then 
implies that, for infinitely many integers n, the numbers z; + 7,...,Z2K +. are all prime. 
It follows that 


1 cL 
L)>K>L]](1-_)>—~., 
flay) = Be ( >) ~ log L 
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where c > U 1s a constant independent of L and the product is over all primes less than or 
equal to L. Therefore, limz-.o9 f(L) = 00, as needed. 

Solved also by T. Callahan, R. J. Chapman (U. K.), W. T. Gan (student, U. K.), J. W. Grossman, R. High, N. Komanda, 
J. H. Lindsey II, O. P. Lossers (The Netherlands), R. Martin (student), G. Myerson (Australia), A. Nijenhuis, A. Riese, 


R. M. Robinson, K. A. Ross, E. R. Scheinerman, K. Stoop (Switzerland), G. Thompson, Western Maryland College 
Problems group, University of Wyoming Problem Circle, and the proposer. 


A Trace Inequality 


10234 [1992, 571]. Proposed by Gétz Trenkler, University of Dortmund, Dortmund, Germany. 


Let A and B be nonnegative definite Hermitian matrices such that A — B is also nonneg- 
ative definite. Show that tr(A) > tr(B?). 


Solution I by Andreas Miiller, Bures-sur-Yvette, France. The following proof shows that it 
suffices to assume only that A + B and A — B are nonnegative definite. 

We regard A and B as selfadjoint operators on the Hilbert space H = C”. Since A+ B 
is hermitian and nonnegative definite, we can choose an orthogonal basis of eigenvectors of 
A+B,e; € H,1 <i <n. The eigenvalues A; (with (A+ B)e; = A;e;) are all nonnegative. 
Then also 

(e;, (A — B)(A + B)e;) = Aj (ei, (A — B)e;) = 0 


since A — B is nonnegative definite. The trace of (A — B)(A + B) is the sum of these terms, 
hence nonnegative. But, 


(A — B)(A+ B) = A*— B?+AB-—BA 
0 <tr((A—B)(A + B)) = tr(A’) — tr(B*) + tr ([A, B]) 
= tr(A*) — tr(B’) 


since tr(AB) = tr(BA). 

The claim and its proof remain valid for nonnegative operators of trace class in any 
separable Hilbert space. Trace class operators are compact. Hence the spectral theorem for 
hermitian compact operators guarantees the existence of a basis of eigenvectors for A + B. 
Also, the fact that the trace class operators form an ideal ensures that all the operators whose 
traces are needed in the above proof are of trace class. 


Solution II by Thomas H. Foregger, AT & T Bell Laboratories, Warren, NJ. We have tr(A*) — 
tr(B*) = tr((A — B)A) + tr( B(A — B)), so the result follows from 
Lemma. If X and Y are nonnegative definite hermitian matrices, then tr(X Y) = 0. 


Proof. For such X and Y, there exist U and V such that X = UU* and Y = VV. Hence, 
(XY) = tr(UU*VV*) = r(U*VV*U) = tr ((U*V)(U*V)* ) > 0. 


Solution III by Duane W. Bailey, Amherst College, Amherst, MA. If X is ann by n hermitian 
matrix, let A4;(X) < A2(X) <... < A,(X) denote its eigenvalues arranged in order. Then, 
since A — B is nonnegative definite, Corollary 4.3.3 on p. 182 of Roger A. Horn & Charles R. 
Johnson, Matrix Analysis, Cambridge, 1985 gives 

Ax(B) <Ag(B+(A-B))=A,(A) fork =1,2,...,n. 


If we further assume that B is nonnegative definite, it follows that A is also nonnegative 
definite and 


tr(B?) = 0 Ax(B)? < ) Ax (A)? = tr(A’). 


k=1 k=1 
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Editorial comment. Many solvers recognized that the result could be obtained in different 
ways and gave two proofs. Pei Yuan Wu noted that Solution III easily gives that tr(A?) > 
tr(B?) for any nonnegative number p, and that the same conclusion for 0 < p < 2 can 
be obtained from Solution II since A?/* — B?/? will be nonnegative definite for these p 
(see Theorem 3 of N. N. Chan & Man Kam Kwong, “Hermitian matrix inequalities and a 
conjecture”, this MONTHLY, 92 (1985), 533-541). Dennis I. Merino gave a converse to the 
lemma of Solution II: A given matrix A is nonnegative definite if and only if tr(X A) > 0 
for every nonnegative definite matrix X. 


Solved also by K. V. Bhagwat (India), D. Callan, R. J. Chapman (U. K.), J. Dai & Q. Luo (China), E. A. Herman, 
R. H. Jeurissen (The Netherlands), N. Kang (student, Korea), M. K. Kinyon, O. Krafft (Germany), G. Letac (France), 
D. I. Merino, J. M. Monier (France), A. Nijenhuis, I. Olkin, H. Ozdemir (Turkey), M. Qian, E. T. Wong, P. Y. Wu (China), 
University of Wyoming Problem Circle, and the proposer. 


Area of a Roulette 


10254 [1992, 782]. Proposed by E. Ehrhart, Université de Strasbourg, Strasbourg, France. 


The curve traced out by a fixed point of a closed convex curve as that curve rolls without 
slipping along a second curve will be called a “roulette”. Let S be the area of one arch of 
a roulette traced out by an ellipse of area s rolling on a straight line. Prove or disprove that 
S > 3s, with equality only if the ellipse is a circle. 


Solution by Murray S. Klamkin, University of Alberta, Edmonton, Alberta, Canada. We will 
show that the inequality is equivalent to (a — b)(a — 2b) > 0 where a and b are the major 
and minor semi-axes of the ellipse, respectively. Consequently, there will be equality if 
a = bora = 2b. There will be strict inequality only if a > 2b. 

Recall that the pedal of a given curve with respect to a point P is the locus of the foot of the 
perpendicular from P to a variable tangent line to the curve. The desired result follows from 
the following results of Steiner that can be found in B. Williamson, The Integral Calculus, 
Longmans, Green and Co., London, 1941, 201-203. 

(A) When a closed curve rolls on a straight line, the area between the line and the roulette 
generated in a complete revolution by any point on the rolling curve is double the area 
of the pedal of the rolling curve, this pedal being taken with respect to the generating 
point. 

(B) The area of the pedal of an ellipse of semiaxes a and b with respect to any point P is 
given by m(a* + b? + |O P|*)/2, where O is the center of the ellipse. 

In the interest of simplicity, these theorems have been stated only when P lies on the curve. 
This is not an essential restriction. 

Clearly, the minimum of S for P on the ellipse occurs for |O P| = b. Hence, S > 3s is 

equivalent to 
m(a*+2b*)>3mab or (a—b)(a—2b) > 0. 


Editorial comment. The other solvers were able to work through the Calculus without 
references, but as one of them said: “... [hope some are more elegant in the way they prove 
the result; I just ground out the integral... ”. 

The work leading to the formulation of this problem can be found in E. Ehrhart, “Les 
roulettes d’ellipses, L’Ouvert 62 (1991), 43-45. 

Other references to Steiner’s theorem found by the editors are E. Goursat, A Course in 
Mathematical Analysis, Vol. 1, Dover, 1959, where it is problem 23 (with hints) on p. 207, 
and J. Edwards, A Treatise on the Integral Calculus, Chelsea, 1955, article 673, pp. 696-697, 
which refers back to W. H. Besant, Tract on Roulettes and Glisettes, 1870 (and not to Steiner). 

Richard Holzsager suggested that it would be interesting to find the convex curve C and 
point P on C which gives the minimum ratio of the area of the roulette to the area of the 
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curve. He conjectured that C is given by the arc of the epicycloid x = 3. cos 9 — cos 30, y = 
3sin@ — sin30 with its endpoints (+2, 0) connected by a line segment. The point P is 
(0, 0). For this curve, the ratio can be calculated to be 8/3 while the above results show that 
when C is an ellipse, the smallest value is 2/2. 


Solved also by J. Anglesio (France), M. V. Bjelica (Yugoslavia), and R. Holzsager. One incorrect solution was 
received. 


The Case of Horological Interchangeability 
10260 [1992, 873]. Proposed by Gerald Weinstein, The City College, CUNY, New York, NY. 


A man has a bizarre watch with indistinguishable hands. An act of violence, taking place 
sometime between midnight and the following noon, simultaneously kills him and stops his 
watch. Is it always possible to determine the time of death uniquely from this information 
if: 

(a) the watch has hour, minute and second hands? 

(b) the watch has only hour and minute hands? 


Solution by John H. Lindsey IT, Ft. Myers, FL. Let the positions of the hour, minute, 
and second hands as a fraction of the way around the dial from 12 be given by x, y, z, 
respectively. Then 0 < x, y,z < 1, and there are integers 0 < m < 12 andO0 <n < 60 
such that 12x = m+ y and 60y = 2+ z. An ambiguity means that analogous equations 
with integers i, j in place-of m,n hold after some nonidentity permutation of x, y, z. 

First consider the transposition (x, y). Using the additional equation 12y = i + x, 
we have 144y = 12@ +x) = 12i+m+y. Thus y = (12: + m)/143, and hence 
x = (m+ y)/12 = (12m + 1)/143. If there is no second hand, any choice of i 4 m will 
yield an ambiguous time. Thus the answer to (b) is no. If there is a second hand, however, 
we have y—x = "#2 — 2 = 4 and also y — x = itm — Vemti _ Um) _ em 
Since 13 and 60 are relatively prime, this forces y — x to be an integer. Hence x = y and 
there is no ambiguity. 

Since the hour hand determines the time by itself, the transposition (y, z) cannot yield 
an ambiguity. The transposition (x, z) leaves the minute hand unchanged. Since the minute 
hand determines the second hand, this forces x = z and there is no ambiguity. 

Finally, we consider 3-cycles. As (x, y, z) and (x, Z, y) are inverses, we: need only 
consider one of them. Suppose 12z = i + x and 60x = j + y. Thenj + y = 5(12x) = 
5(m + y), and hence 4y = j —5m. But nown+ z = 60y = 15(j — 5m) € Z. This 
requires z = 0. Since z is the position of the hour hand in the second reading, this requires 
x = y=z=(0. If the event was strictly between midnight and noon, then the answer to (a) 
is yes. 


Editorial comment. Stan Wagon pointed out that part (b) previously appeared as problem 
E1571 [1963, 330; 1964, 91]. This two-handed case also appears in Thomas Szirtes, “On the 
problem of the interchangeable clock hands,” Journal of Recreational Mathematics 8 (1975— 
76), 159-168 and an approach to a generalization is sketched in Karel A. Post, “Letter to 
the editor”, Journal of Recreational Mathematics 11 (1978-79), 41. These articles contain 
suggestions that the problem is quite old. This is confirmed by its appearance as problem 
48 in H. E. Dudeney, 536 Puzzles and Curious Problems (Martin Gardner, ed.), Scribners, 
1967, which is a reprint of earlier collections of problems. It also appears as chapter 143 of 
Joe Roberts, Lure of the Integers, MAA, 1992. 

Solved also by J. Andraos (Canada), M. Bowron, R. J. Chapman (U. K.), M. P. Eisner, D. L. Grant & M. Heggie 
(Canada), I. Kastanas, O. P. Lossers (The Netherlands), W. D. McIntosh, N. Passell, S. Paul & J. Hess, R. E. Prather, K. Y. 
Tsang, E. A. Weinstein, Anchorage Math Solutions Group, GCHQ Problem Solving Group (U. K.), Trinity University 


Problem Group, and the proposer. Three other solvers correctly solved part (b) only, and one incorrect solution was 
received. 
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Identities for the Catalan Generating Function 


10264 [1992, 874]. Proposed by L. W. Shapiro, Howard University, Washington DC, and D. G. 
Rogers, Australian National University, Canberra, Australia. 


Let C, = 1/(n+ 1)(*") for n € N and form the generating function 
C(x) =) Cyx”. 
n>0 


Establish the identities 
(a) Do (n +1)x"C(x)*"*2 = Yo (4x). 


n>0 m>0 
(b) 57 (2n +:1)x"C(x)"t1 = Ye (4x). 
n>0 m>0 


Solution by J. C. Binz, University of Bern, Bern, Switzerland. The generating function for 


the Catalan numbers C,, is well known to be C(x) = %J=**. We use the easily verified 
relations 


2 _ C(x) _ 1 
xC(x)* = C(x) - 1 and 7G) Ca) > vrs 
With y = xC(x)*, we obtain 
2 
Yet Dx"Cw)*? = Ca? Dat py" = 

n>0 n>0 (1 — y) 
(ce Vo 1 | , 
=(3=¢q) “ioe 20)". 


and similarly, 
Y “(Qn + 1)x"C(x)2"t! = C(x) Qn + 1)y" = C(x) + y) 


n>0 n>0 (1 a y)? 
—{ Ce \P oo 1 | , 
= (5G) “ime = 24s)". 


Editorial comment. Combinatorial interpretations of the formulas were submitted by 
David Callan and Renzo Sprugnoli. 


Solved also by S.-J. Bang (Korea), J. L. Bryant, D. Callan, R. J. Chapman (U. K.), J. L. Drost, S. Getu, I. Kas- 
tanas, P. Kirschenhofer (Austria), M. S. Klamkin (Canada), I. 1. Kotlarski, Y. H. Kwong, O. P. Lossers (The Netherlands), 
R. Martin (student), C. R. Pranesachar (India), H. Prodinger (Aystria), R. Richberg (Germany), E. Schmeichel, R. Sprug- 
‘noli (Italy), D. B. Tyler, F.-Z. Zhao, GCHQ Problem Solving Group (U. K.), Western Maryland College Problems group, 
University of Wyoming Problem Circle, and the proposers. One incomplete solution was received. 


Collaborating editors: David F Appleyard, Paul T. Bateman, Bruce C. Berndt, Duane 
M. Broline, Barry W. Brunson, Frank S. Cater, Gulbank D. Chakerian, Underwood Dudley, 
Gerald A. Edgar, Michael A. Filaseta, Ira M. Gessel, Richard A. Gibbs, Jerrold R. Griggs, 
Douglas A. Hensley, John R. Isbell, Mourad E. H. Ismail, Murray Klamkin, Daniel J. Kleitman, 
Frederick W. Luttmann, Frank B. Miles, Richard Pfiefer, Stephen L. Portnoy, J. O. Shallit, John 
Henry Steelman, Kenneth B. Stolarsky, David E. Tepper, Douglas B. Tyler, Daniel Ullman, and 
William E. Watkins. 
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Why Did George Green Write His Essay 
of 1828 on Electricity and Magnetism? 


I. Grattan-Guinness 


1. HONOUR TO GREEN. Among the centenaries of mathematicians and scien- 
tists celebrated in 1993, perhaps the most remarkable was the bicentenary of the 
birth of a professional miller and part-time mathematician, one George Green 
(1793-1841) of Sneinton, then near Nottingham. Among other achievements, he 
was the creator of theorems and functions now named after him which make him a 
principal contributor to potential theory and to its applications in mechanics and 
mathematical physics. 

During the week corresponding to that of his birth (which occurred on 14 July) 
various events took place. A three-day conference was held at the University of 
Nottingham, mainly on the use of his work in modern mathematics and physics. It 
included a visit to the mill at Sneinton, which had been restored and opened as a 
science centre in 1985. The next day a stained glass window was dedicated at the 
Gonville and Caius College, Cambridge, where he was resident from 1833 to 1837 
as an extremely mature student, and (only) for the winter of 1839-1840 as a 
Fellow. Finally, on Friday 16 July a meeting was held at the Royal Society of 
London on his life and work and the modern importance of the latter. It was 
followed by a quite exceptional event: the unveiling of a plaque in his memory in 
the floor of the nave of Westminster Abbey, close to the tomb of Isaac Newton and 
to the plaques for his first publicist Lord Kelvin, Michael Faraday and Clerk 
Maxwell. 

These events had been preceded by the publication in May of an excellent 
biography of Green (Cannell 1993)—a daunting task to write, as his life is so 
obscure (for example, no surviving likeness or portrait has ever been found, and his 
manuscripts seem to have been destroyed). It is clear, though, that in virtual 
isolation at Sneinton he taught himself Continental mathematics, and produced 
first-class research work. It was published in 1828 (his 38th year) as a 72-page 
Essay on the Mathematical Analysis of Electricity and Magnetism (Green 1828), put 
out at his own expense with the help of a subscription list. Largely ignored during 
the author’s lifetime, it has since been reprinted no less than seven times and 
translated into German. How and why was it created? 


2. THREE STRANDS IN 18TH-CENTURY MECHANICS. One of the most pro- 
found influences exercised by this fugitive work is that it raised both the status of 
potential theory in mathematics and the quality of the theorems that could be 
stated in it. Prior to this time three strands of thought in potential theory (as we 
now understand it) were active, though not necessarily with close links between 
them (Todhunter 1873, vol. 1). 

The most significant strand was the attraction of spheroids to an external point. 
Isaac Newton had found various special properties in the Principia, in his synthetic 
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style: they were extended from the 1770s onwards by P. S. Laplace and 
A. M. Legendre using analytical methods, especially the Legendre functions and 
surface and zonal harmonics. 

Another line came from Alexis Clairaut on the Continent from the 1730s (with 
some contributions from Colin MacLaurin in Britain soon afterwards), where 
properties of equipotential surfaces were studied; this work laid stress on the exact 
differential of a function of several variables, and assisted in the birth soon 
afterwards of the full partial differential calculus. With d’Alembert, some aspects 
of Euler’s work, and especially J. L. Lagrange, variational mechanics was devel- 
oped, in which force and velocity potentials were often used in the formation of 
differential equations. 

A third strand grew out of Daniel Bernoulli’s Hydrodynamica (1738), where 
considerations of ‘ascensis actualis et potentialis’ led to conservation of energy as a 
basis for (much) mechanics; his notions were to end up in the next century as 
kinetic and potential energy respectively, although with substantial changes in 
conception in which potential theory was to play a role. 

In addition, an isolated contribution came from Lagrange in 1762. While 
pondering ways of solving the equations for the propagation of sound in three 
dimensions, he formed volume integrals of the solution in each co-ordinate 
direction, integrated them by parts to create surface integrals, and then added up 
the resulting equations to obtain a simpler differential equation to integrate. A 
clever but ad hoc manoeuvre, it had little influence even upon its distinguished 
innovator; but it was closer to the way ahead pursued in the early 19th century 
than the strands just mentioned. 


3. POISSON AND THE APPEARANCE OF DIVERGENCE THEOREMS. Enter 
Siméon-Dénis Poisson (1781-1840), the leading supporting actor in this drama, 
student and then professor and graduation examiner at the Ecole Polytechnique, 
devout follower of Laplace and Lagrange in mathematical methods and physical 
modelling. Poisson inaugurated mathematical electrostatics (I shall use the word 
‘electricity’ of the time) in two papers (Poisson 1812, 1814) published by the Paris 
Academy of Sciences in which he analysed arrangements studied experimentally by 
C. A. Coulomb 30 years previously; equilibrium on a charged spheroid, and 
between two spheres. The principal mathematical exercise was to modify Legendre 
functions and related potential theory to fit the assumptions made about the 
phenomena (Grattan-Guinness 1990, 496-513). 

In a short paper written soon after, these two, (Poisson 1813) rectified an 
important oversight of his masters when he pointed out that the differential 
equation governing the potential V to a body A relative to an interior point M was 
not Laplace’s equation 


AV = 0 (with ‘A’ as the Laplacian operator) but AV = —47p, (1) 
where p was the density of material at M. He might have got this insight from his 
recent work on electricity; another strong candidate is a paper of that year on the 


attraction of spheroids by Carl Friedrich Gauss, which contained a result which in 
vectors reads 


[dsr =0 or = —47, wherer = BM (2) 
S 


and B is an arbitrary point in A, according as M is outside or inside the surface S 
of A. Neither man dealt with the case where M is on S, when —277 obtains in (1), 
(Grattan-Guinness 1990, 418-424). 
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Twelve years later Poisson came to the Academy of Sciences with another pair 
of papers, this time analysing magnetism (Poisson 1826a—b). Taking a magnetic 
body A to be composed to discrete ‘magnetic elements’ D, he set to zero certain 
surface integrals over D expressing internal equilibrium, and wrote down volume 
integrals to state the components of attraction of A to an external point M 
relative to his imposed rectangular co-ordinate system (x, y, z). The second part of 
the first paper dealt with a ‘simplification of the preceding formulae’; integrating 
these integrals by parts with respect to (say) z led him to convert the volume 
integral to an integral over the surface S of A. I write his finding in the form 


[JJ He(s y, 2) dedy dz = J [ACs ys. 2s) cos ndS, (3) 


where H was the function expressing the components of magnetic attraction, and 
n was the angle between the z-axis and the normal at the point (xz, ys, zs) of S. 
Adding this formula to its brothers for the x- and y-directions gave him the first 
general divergence theorem in mathematics; imitating the notation of (3), it can be 
written 


[J [F: + Gy + H,|(x, y,z) dxdydz 


= [[[F cos! + Gcosm + Hcosn]|( Xz, ys; Zs) aS. (4) 
S 


He modified it for the case when M was inside A by the manner of his proof of 
(1),, and found a new term involving a factor — 47/3. 

Poisson knew that his result was not restricted to convex bodies (a sum of 
integrals of the form (3), is required as the z-axis goes in and out of A), nor to 
magnetism. But he saw it simply as a convenience; triple integrals are replaced by 
double integrals (Grattan-Guinness 1990, 948-953). This point will be crucial for 
Green, as we shall see the next section. 

In a third paper, published by the Academy in the Mémoires (Poisson 1827), he 
analysed the process of magnetisation in moving bodies. A most complicated 
analysis used Legendre functions once again; but an important detail was his 
recollection of his equation (1), for interior points, and first presentation of the 
version with —2rp for surface points. 

Surface integrals were enjoying a springtime in French mathematics at this time. 
For example, Adrien Marie Ampére had been studying electromagnetism and 
electrodynamics (his word) since 1820; his analysis made adroit use of both surface 
and line integrals, the latter arising naturally in connection with the attraction 
caused by current-bearing wires (Grattan-Guinness 1990, 941-961). One of his 
most remarkable results, published in 1826, was to show that Poisson’s basic 
formulae for magnetism could be restated in his own preferred conception, which 
saw magnetism as a special case of electricity and so replaced Poisson’s ‘magnetic 
elements’ with a tiny electrical solenoid (his word again). 

Another source was Joseph Fourier’s pioneering work on heat diffusion, created 
in the mid 1800s, fully published only in the early 1820s, especially in his book 
Théorie analytique de la chaleur (1822), and then receiving much attention from the 
new generation of French mathematicians. In particular, around 1826 Jean 
Duhamel and Russian visitor Mikhail Ostrogradsky independently sought to justify 
mathematically Fourier’s use of trigonometric series solutions (Grattan-Guinness 
1990, 1168-1176). Let f and g be two different special solutions for diffusion in a 
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body A, and consider /:= /{{,fgdV. Integrating by parts through A led to a 
divergence theorem like Poisson’s (4); and applying Fourier’s external surface 
condition showed that in fact J = 0. Hence f and g were orthogonal over A, like 
the sine and cosine functions. We can see that this does not provide the justifica- 
tion sought; more to the point is the use again of surface integrals and a 
divergence theorem. 

Although these integrals were making appearance, their presence in mathemat- 
ics was still slight. Good evidence is provided by Augustin Louis Cauchy 
(1789-1857), former pupil of the Ecole Polytechnique (when Poisson was professor) 
and now professor there himself, inaugurating his revision of the calculus and 
mathematical analysis by his famous new approach with the theory of limits at the 
centre, emphasis laid upon continuity of functions. (Poisson and others there 
protested vigorously.) Above all, the derivative and integral were defined sepa- 
rately, so that the fundamental theorem of calculus became a proper theorem for 
the first time (Grattan-Guinness 1990, 707-804, including Cauchy’s concurrent 
inauguration of complex-variable analysis). However, he never furnished a defini- 
tion of either the line or the surface integral, although the required forms of 
definition would not have been hard to devise; they were too marginal to be worth 
such attention. 

Then Green started thinking. 


4. GREEN AND THE PLACE OF SURFACE INTEGRALS. Possible sources for 
Green’s essay will be appraised in the next section; here its main contents are 
described. Pages are cited from the printing in the edition of his works (Green 
1871). 

After various preliminaries, the essay contains two roughly equal parts on 
electricity and on magnetism, in that order. These latter analyses draw heavily on 
various largely known integral expressions to state external and internal potentials 
(the latter maybe learnt from Poisson’s (1),), and Legendre functions to express 
the potentials in analytical form. He extended various results due to Poisson, and 
considered some variant situations, such as when the spheres are connected by a 
wire (Whitrow 1984). 

The chief novelties were presented in the ‘general preliminary results’ stated in 
the opening. First was the explicit specification of ‘the potential function,’ as he 
called it (1828, 9), and now named after him: 


It only remains therefore to find a function V’ which satisfies the partial differential equation, 
becomes equal to [a given function] V’ when [the point] p is upon the surface A, vanishes when 
p is at an infinite distance from A, and is besides such that none of its differential co-efficients 
shall be infinite when the point p is exterior to A. 


(1828, 12: note the inadequate specification of V’(~), and that the prime does not 
denote differentiation). This formulation anticipated in certain ways the ‘Dirichlet 
principle’, which was to assume such status in potential theory when its author 
began lecturing on the subject from 1839 in Berlin: a decade earlier he also was in 
Paris, but working on Fourier’s heat theory, Cauchy’s analysis, and number theory. 

Secondly was Green’s type of divergence theorem, expressed entirely within the 
rectangular co-ordinate system (x, y, z) rather than with surface differentials of 
Poisson’s (4): for two ‘continuous functions’ U(x, y, z) and V(x, y, z) 


‘ [ dx dy dz U8V + [ do U(dV/dw) = | dx dy dz VU + [do V(du/dw)’ (5) 
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(1828, 23). I follow his use of ‘5’ for the Laplacian operator (an unusual symbol, 
perhaps required by the limitations of his printer’s font box), ‘do’ for the element 
of the surface, all integrals stated with only one sign ‘f’ (unlike Poisson’s use of 
multiple integral signs), and round brackets to indicate partial ‘differential co-effi- 
cients’ (Euler’s practice, and name also, both of which Green followed). He 
modified his result for ‘singularities’ in U (or V) at points G by adding in terms of 
the form —47U(XoG, Yc, ZG) to the appropriate side of the equation (1828, 27), 
like Poisson’s own modification; he may also have known of Poisson’s equation (1), 
from its reappearance in (Poisson 1827). I wonder at the import of the continuity 
imposed upon U and V, and the reference to ‘singularities’; had he also been 
reading Cauchy on reforming the calculus? 

Green had taken up a current research interest in mathematical physics in using 
volume and surface integrals to analyse electricity and magnetism; and with his 
insights he surpassed all contemporaries. This theorem (5), while similar in 
mathematical form to Poisson’s (4), was understood at a far deeper level as physics 
(and also surpassed Gauss’s (2) in generality). Whereas Poisson saw only simplifi- 
cation in his integral, Green recognised that the importance of his own theorem lay 
in relating properties inside bodies to properties on their surfaces and vice versa. He 
must have realised that theorems of this kind served for multiple integral calculus 
like the fundamental theorem of the calculus itself; hence the importance of 
integration by parts. 

These insights doubtless led Green further to the novelty of his function V’ in 
which conditions in a body and on its surface were imposed. Such functions were 
found for various cases with the help of his theorem; one of them followed it in its 
symmetrical form (Green 1828, 37-39), and launched what have become known as 
‘reciprocity relations’. 

One may guess therefore, with some confidence, that Poisson’s first two papers 
on magnetism were the source of inspiration for Green’s research, especially the 
divergence theorem (4). Up to then Green had doubtless been learning mathemati- 
cal skills and theories, but he had not found a deep problem: Poisson (unintention- 
ally) provided this, in the form of an unexceptionable but somewhat limited use of 
Legendre functions to analyse the distribution of magnetism, and especially in a 
‘simplification’ which held much deeper consequences than its author had realised. 


5. SOURCES AND INFLUENCES. While it is possible to guess at Green’s origi- 
nal motivation, his training in mathematics remains unknown. Among local figures, 
headmaster John Toplis (1774/1775-1857) would have been a crucial figure in 
forming the interests of his former pupil at Nottingham Grammar School: a 
deplorer of the state of British mathematics in the Philosophical Magazine in 1804, 
a translator of Lacroix there a year later, and of Book One of Laplace’s Mécanique 
céleste in a book published in Nottingham in 1814. However, in 1819 he returned 
to his college (Queen’s, Cambridge), and was not to be one of the subscribers. The 
only other likely supporter is Sir Edward Bromhead (1789-1855: so Green’s senior 
by a mere four years), member with Charles Babbage and John Herschel of the 
Analytical Society at Cambridge in the mid 1810s, and a subscriber to the essay; 
but his letter of April 1828 acknowledging receipt of his presentation copy 
(Cannell 1993, 67) shows that he had not been aware of its contents before it 
arrived. 

Green’s access to literature is also little understood. In his essay he cited as 
mathematical sources Laplace’s Mécanique céleste, Book 3 (1799) for Legendre 
functions, Fourier’s Théorie analytique de la chaleur, and of course, Poisson’s three 
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papers on magnetism and the two on electricity; Boit’s Traité de physique (1816) 
was used for information on Coulomb’s experiments. A passing reference (1828, 
103) to Lagrange’s follower L. F. A. Arbogast shows his familiarity with some of 
the current French operator techniques. A sentence in his introduction comparing 
Fourier with Cauchy and Poisson on methods of solving differential equations in 
hydrodynamics (1828, 8) suggests that he had read the paper (Fourier 1818) on 
precisely this matter, which had been published in a Paris journal (Grattan- 
Guinness 1990, 683-686). 

How did Green gain access to these works? While British texts would have been 
available in the local library, access there to foreign literature is much less certain, 
even presuming that he had funds available to buy it. The point is particularly 
perplexing for journals—in particular, the Paris Mémoires with its Poisson papers. 
How did Green know that those papers were published there in the first place? 
Although presentations to the Academy were reported in Paris journals and 
sometimes abroad, the news did not circulate very much, and Poisson had not 
given any warning in earlier papers that research in magnetism was in progress. 
The best chance was that some summary version was translated into a foreign 
language such as English—and indeed this did happen to summaries of these two 
papers, in the Quarterly Journal of Science (Poisson 1824, 1825). 

Each summary paper began with a virtually verbatim repeat of parts of the 
Opening preamble of the parent paper, and then summarised some later results 
and features. The accounts concentrated mostly on physical and experimental 
aspects; mathematical procedures were only mentioned (and three formulae quoted 
in the second summary), although not in a manner to reveal any major novelties. In 
particular, the divergence theorem (4) was described only in general terms, and 
with reference to simplication: ‘by means of certain transformations, the triple 
integrals which they contain are reduced to double integrals, and the equations 
become much more simple’ (Poisson 1824, 327). No reader of the time could have 
guessed that surface integrals were involved; but Green might have been alerted to 
watch out for the full versions of the papers. 

Regarding timetable, the volume of the Paris Academy Mémoires containing 
these papers appeared right at the end of 1826 (Academy of Sciences 1918, 473). 
Allowing for the usual delay for ships to deliver copies across the channel, one can 
guess that the spring of 1827 was in hand before Green read at least Poisson’s first 
paper and had his inspiration. Since his essay was to appear in April 1828, this 
would have given him a maximum of around a year to carry out the research—not 
an excessive time, even for a part-time-mathematician. His motivation was high, 
most of the required skills and familiarity with the literature were already available 
—and above all his ideas were fruitful, so that the fruit would grow freely and 
quickly. 


6. OPTIONS FOR PUBLICATION. However, in contrast to this splendid piece of 
research and development, Green’s sales and marketing were hopeless. He cannot 
be blamed for his scientific isolation in Nottingham, but he was somewhat naive in 
resorting to the traditions of publication by public subscription. For the increase in 
scientific activity in Britain in recent years, together with advances in printing 
technology, had raised the chances and opportunities for publication, especially for 
an author like him with financial means available to assist with the costs of 
production. He could have tried Deighton’s of Cambridge, who were then publish- 
ing quite a lot in mathematics (Grattan-Guinness 1985) and in fact stocked the 
essay when it came out; or maybe Taylor (now Taylor and Francis), regular 
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producers of scientific books. He could have written a paper summarising his 
findings for their Philosophical magazine, which was widely distributed in the 
scientific world: although it did not publish mathematics frequently, there were 
papers from time to time, and indeed there had been an exchange in there in 1826 
on another aspect of potential theory (namely, properties of equipotential sur- 
faces) between Poisson and the Scottish born mathematician James Ivory 
(Grattan-Guinness 1990, 1190-1195). In fact, if he had felt it proper so to act he 
could have sought advice from Ivory, the mathematician most conversant with 
potential theory in Britain at that time. He might also have treated his manuscript 
as a long paper instead of a short book, and tried to submit it to the Royal Society, 
or the Cambridge Philosophical Society, or the Royal Society of Edinburgh. 

I have no doubt that Green never considered any of these possibilities. His 
essay went to his 52 supporting subscribers, most of whom could not have read a 
page of it (Green (H.) 1946, 45-48); and so it vanished from sight. Very rarely has 
it appeared even in booksellers’ catalogues. 


7. ON GREEN’S SECOND PERIOD. Green’s later career was somewhat less 
unorthodox than previously, in as much as he was resident at Gonville and Caius 
College Cambridge (Bromhead’s alma mater) from 1833 to 1837 and for some 
months of 1839-1840 as a Fellow. He had a small overlap in residence with 
someone capable of understanding his work, indeed the first mathematician to cite 
the essay; but this was the eccentric Robert Murphy (1806-1843), who spoilt a 
promising career by financial incompetence. ! 

Green published eight papers (and a supplement to one of them), mostly in the 
Transactions of the Cambridge Philosophical Society with Bromhead as communi- 
cator. However, his marketing skills were again to the fore: he cited his essay only 
twice (1871, 120, 192), and on neither occasion did he even give the reader the 
publication details, never mind a comment to explain its importance. 

The other papers fall into two partly related groups, both showing strong 
French influence in both content and methodology (Burkhardt 1908, ch. 13). One 
group deals with elastic bodies, which could be construed to be physically bending 
objects, or else the elastic aether (and perhaps with luck, both at once). The task 
was to study the propagation of longitude and transverse vibrations; Green also 
tackled the difficult question of behaviour at the interface between different 
substances. He sought generality by making no stipulations about the constituted 
properties of the substances. The principal influence seems to have been the 
non-molecular studies of elasticity made from 1827 onwards by Cauchy, which had 
been partly inspired by Fresnel’s work in waval optics (Whittaker 1951, ch. 5). 

The other group examined the potentials of fluids, which again might cover 
sound and water, but also the supposed electric and magnetic fluids. Green made 
this analogy quite explicit in the title of the first of these papers, when referring to 
the ‘laws of the equilibrium of fluids analogous to the electric fluid’ (1871, 117). 

For methods Green used both his function and theorem, and some of the 
special results from his essay. He played a little more with operator methods, and 
produced some solutions in terms of elliptic integrals (although he seemed to be 


1On Green’s and Murphy’s work see (Cross 1985); the reference to Green’s book is in (Murphy 
1833, 587). The nature of Murphy’s misdemeanours has not been clear; my information comes from a 
letter of perhaps 1835 to Babbage written by Augustus De Morgan, who was Professor in London 
University, where Murphy was then trying to make a living (British Library, Additional Mss. 37189, no. 
241). Compare (Cannell 1993, 112-113). 
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unaware of the recently introduced elliptic functions). A paper on the motion of 
waves and canals took a step towards the approximating asymptotic solution 
method now known as ‘WKB’ (Schlissel 1977, 309-314), although he limited 
himself to working within the linearising models of his time. He worked with 
potentials to the inverse nth power; and on one occasion he required of his 
potential function that it be invariant under infinitely small rotations, a step that 
Sophus Lie was to bring (independently) to great generality and prominence 60 
years later. 

Somewhat separate from Green’s other papers was one dealing with the motion 
of the ‘simple’ pendulum. This was a favourite topic at this time, a typical example 
of small-effect science; for the pendulum was required to work to a great degree of 
accuracy for the purposes of geodesy. Laplace and Poisson, and also F. W. Bessel 
and G. B. Airy, had been among its many earlier students (Wolf 1889-1891). 


8. RECOGNITION. Green’s marketing skills increased at least to the extent that 
he sent some of these papers to Carl Jacobi (Cannell 1993, 104; the copies are in 
private possession), and presumably while at Cambridge he gave copies of his essay 
to William Hopkins, who passed either two or three copies on to the young 
William Thomson (1824-1907) in 1845.2 Then, as is famously known, the essay 
found its first enthusiastic reader, four years after the death of its author in 1841. 
Thomson introduced the name ‘Green’s theorem’, and soon came to his ‘method 
of images’ as result of reading the analysis in the essay of the effect on the 
electrical charge in a body at an interior/exterior point of a source at a given 
exterior/interior point (Green 1828, 50-54). He soon arranged for the essay to be 
reprinted in Crelle’s journal, although it did not appear until 1850-1854.? Later he 
and P. G. Tait called the Dirichlet principle ‘Green’s problem’ (Thomson and Tait 
1883, arts. 499-518). The name ‘Green’s function’ for functions satisfing condi- 
tions like Green’s own is due to Bernhard Riemann and Carl Neumann (Burkhardt 
and Mayer 1900, art. 18). 

Today, Green’s function and his theorem are extolled because of the roles 
which they continue to play in modern physics and in engineering; but it would be 
a misunderstanding of history to think that their importance is due to these 
applications. On the contrary, their rise occurred during the period of classical 
physics, when there appeared a mountainous production of books and papers on 
potential theory and its use in mathematical physics (Bacharach 1883); all the 
applications mentioned above were involved, and in due course new ones such as 
thermodynamics and meteorology, and also mathematical economics. All major 
applied mathematicians took part, along with many minor ones, and some pure 
mathematicians also (in particular Karl Weierstrass, who sabotaged standard 
methods of manipulation in 1870 with his famous counter-example to the Dirichlet 
principle using the inverse tangent function). 

Not only Green’s insights and results were used by his successors; his own work, 
especially the essay, were made available four times in the last thirty years of the 


One of these copies of Green’s essay is now kept at Nottingham (Cannell 1993, 105), and another is 
in Keele University Library; I do not know the location of the third one (if there was one). 

>The circumstances of this reprinting of the essay are strange. Firstly, Thomson asked Crelle and 
not his friend Joseph Liouville, who also edited a journal (still often known after Liouville) and was 
actively interested in potential theory. Secondly, while Crelle had agreed enthusiastically to the 
suggestion by 1846 (Green (H.), 41-43), he did not reprint it for several years, and then in three parts 
over five years. 
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19th century to an extent surpassing all other literature of his own time. The 
edition of his works by N. Ferrers appeared in London in 1871, and was reprinted 
in facsimile in 1903 in (of all places) Paris. The essay itself was also reprinted in 
facsimile, in 1890 in Berlin, in a series of classic reprints of science; five years later 
it appeared in an annotated German translation by A. van Oettingen and 
A. Wangerin, in Wilhelm Ostwald’s famous booklet series of editions of major 
scientific works. Green’s successors in the classical phase not only absorbed his 
contributions into their own heritage; they wanted to read the words of the master 
himself. 

Their modern successors have maintained the tradition; for Ferrers’s edition 
appeared again in 1970, and the Essay itself in 1993, in the university of his home 
town Nottingham, as part of their bicentennial celebrations of their remarkable 
citizen. 
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Lifting Weights 


I once, long ago, went into my favorite dean's office to argue 
once more about good teaching. I remarked that I was teaching 
a weight lifting class (which he knew I was not). I said that 
graduation required lifting 250 pounds, that many students got 
discouraged and dropped out, some repeated the course, and 
very few graduated. “But last night,” I said, “I had the idea 
that many more would graduate if I cut the weights in half and 
graduation would then require lifting one set of 125 pounds, 
setting them down, and then lifting the second set, thus lifting 
the 250 pounds.” 


Is it reasonable to compare physical muscles to mental 
muscles? What indeed are good lecturing and good teaching? 
To what extent does making the teaching clearer and learning 
easier cut the weights in half? Surely, in the long run and for 
most courses, developing the student's abilities (mental 
muscles) to learn is more important than the course content. 
How often do we invert them? 


R. W. Hamming 
Naval Postgraduate School 
Moterey, CA 93943 
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Communicating Mathematics: Useful 
Ideas from Computer Science 


Charles Wells 


1. INTRODUCTION 


1.1. Purpose. This article describes certain ideas originating in the theory and 
practice of computer science, and shows how the teaching and exposition of 
mathematics could benefit if these ideas were widely understood by mathemati- 
cians and used in their teaching and writing. 

These ideas are discussed here because I believe they are important for 
mathematicians to understand. Some of them are based on theoretical work by 
computer scientists and others are based on the practice of computer professionals 
inside and outside academia. Computer scientists would not regard the various 
concepts as of equal importance, and the whole collection of ideas is nothing like a 
fair presentation of the current state of computing. 


2. SPECIFICATION 


2.1. External behavior. A programmer writing a large program may have a tenta- 
tive conception of how to write the program in terms of subprograms that perform 
specific tasks. For example, a program for factoring large integers might use a 
function PrimeQ@:z2 —> {True, False} with the property that Primea [n] 
returns True if the integer n is prime and False otherwise. This description gives 
the function’s external behavior! but says nothing about how that behavior is 
implemented. Perhaps the first implementation of Primea [n] will test whether 
any integer A for which 1<k < Vin| divides n. This might be enough for 
debugging the program that uses Primea®, but not fast enough for practical use. 
Later, an implementation using modern fast techniques could be substituted. Since 
the external behavior of the function PrimeQ@ is the same in either implementa- 
tion, substituting the new implementation for the old should not introduce new 
bugs in the program. 

In this section, I discuss some issues involved in presenting mathematics to 
students that can be clarified by this idea of specifying external behavior. 


2.2. Is everything a set? One concern of those who study the foundations of 
mathematics is to show how to develop the main body of modern mathematics 
from a small number of principles or concepts that are as clear and primitive as 
possible. The bulk of the work in this direction has been to reduce all mathemati- 
cal constructions to the primitive notion of “set” and “element of a set’? and to 


'Many practicing programmers call this its ‘functional behavior” but I would avoid that phrase in 
teaching mathematics students because of confusion with the function concept. My thanks to the 
referee for suggesting the name ‘“‘external behavior’. 
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impose a small number of clear axioms on these primitives.? In carrying this 
program out, an ordered pair <a, b) is typically interpreted as the set {{a, b}, {b}}, 
and a function as a set of ordered pairs with the functional property. 

Many mathematicians have taken this approach to mean that every mathemati- 
cal object is really a complicated set. At least, they say this when the topic of 
foundations comes up. They often don’t behave as if they actually believe every 
mathematical object is a set. Wouldn’t you expect a mathematician to be confused, 
at least momentarily, if you asked which points in the plane had nonempty 
intersection with the point (3,2)?? I maintain that in practice many mathemati- 
cians regard points as one type of mathematical object, sets as another, and 
perhaps functions as a third. Since intersection is an operation defined on sets and 
points are not sets, the question about which points have nonempty intersection 
with (3, 2) is to be rejected as meaningless. 

The best way to think of the reduction to sets that has been carried out by those 
who study foundations is that it is a representation of mathematics which is 
desirable for various reasons, for example showing consistency. Although an 
ordered pair is not a set, it can be represented as a set and that representation may 
be useful for certain purposes. 


2.3. Specification in exposition. My thesis in this section is that we should borrow 
the idea of specifying external behavior and use specifications rather than defini- 
tions of many common mathematical objects in a way that will exhibit how the 
objects relate to other types of objects. The formal definition of a concept may 
require the mention of details of representations used for other purposes (such as 
consistency proofs) that obscure the way the concept is used in practice. In courses 
for undergraduates other than in foundations, we should not even attempt to say 
what sets, pairs, functions and other basic mathematical objects “really are”. What 
matters is how they relate to other objects. 


Recommendation. In elementary exposition, explain a basic concept by giving a 
specification of the concept—a carefully written description of the interaction of the 
object with other mathematical objects.* 


Here are two examples based on my text [Wells, 1993], which is aimed at 
students who have had calculus but no course in abstract mathematics. In these 
specifications, I use the word ‘‘object’”’ to denote any sort of mathematical entity. 


Ordered pairs: An ordered pair is a mathematical object which is distinct 
from but completely determined by objects called its first coordinate and its 
second coordinate. The ordered pair with first coordinate x and second 
coordinate y is denoted by <x, y>. 


* Foundations can also be done using category theory [McLarty, 1993]. 

>This is discussed from a different point of view in [Barr, 1993]. 

“The word “specification” in computer science generally means a description in a formal language 
of the external behavior of a program, suitable for being transformed by strict rules into an actual 
program. In this document, the analogy is more with the informal descriptions practicing programmers 
give of the external behavior of a program, as described in Section 2.1. 
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It follows that ordered pairs are the same if and only if their coordinates are the 
same, that is, 


(<x,y) = <x’, y'>) @& (x =x' and y=y’). 


Thus we have a method of proof: To prove two ordered pairs (x, y) and ¢x’, y’> 
are the same, prove that x = x’ and y= y’. 


Functions: A function F is a mathematical object which determines and is 
completely determined by the following data: 


F.1 F has a domain, which is a set and is denoted by dom F. 

F.2 F has a codomain, which is also a set and is denoted by cod F. 

F.3 For each element x € dom F, F has a value at x. This value is completely 
determined by x and F and must be an element of cod F. The value of F 
at x is denoted by F(x). 


I am sure these specifications could be improved in various ways and would 
welcome suggestions concerning them. There may be a better name than “‘specifi- 
cation” for the practice, too, but it seems clear that the practice should have an 
explicit name to signal its logical status. 


3. SYNTAX AND SEMANTICS. There is a sense in which 2/(4 + 3) is 2/7 and 
another sense in which 2/(4 + 3) is not 2/7. The number 2/(4 + 3) is indeed the 
same number as 2/7. The expression 2/(4 + 3) is not the same as the expression 
2/7. For one thing, the expression 2/(4 + 3) has seven symbols and the expression 
2/7 has only three. 

Syntax is the study of expressions in linguistics or computer science, and 
semantics is the study of how meaning is assigned to expressions. There are two 
different points to make concerning syntax and semantics: (a) Expressions repre- 
sent mathematical objects but are not the objects themselves, and (b) expressions 
have structure. 


3.1. Expressions and their denotations. A mathematical expression denotes a 
mathematical object. The object it denotes is not the expression—the expression is 
only a representation of the object. In particular, different expressions may denote 
the same object. 

This point of view, that there is an object independent of the expressions that 
denote it, is often called “Platonist”>. By contrast, some assert that the expressions 
are merely themselves (‘everything is syntax”) and that mathematics consist of 
manipulating these expressions according to precise rules. Presumably, those who 
hold that attitude will admit that there is an equivalence relation on expressions 
which identifies different expressions that name the same object from a Platonist’s 
point of view. In any case, people who hold these differing points of view generally 
agree on which statements are theorems. 

It is my observation that students and teachers at the college level in the USA 
don’t communicate well with each other because many teachers talk like Platonists, 


>This usage of the word “Platonist” does not imply an endorsement of all of Plato’s attitudes 
toward reality and truth. 
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but many students have the attitude® that what they need are rules to manipulate 
the expressions (more about this in 3.3 below). Students who go on to higher 
mathematics learn to talk as if the mathematical objects were ‘“‘out there’, but it is 
noticeable that many college freshmen in calculus courses do not talk that way. 


Recommendation. Teachers and authors of textbooks should make the distinction 
between syntax and semantics explicit. 


If the student has words for this distinction, he or she may avoid certain types of 
confusion that result from being unaware of the distinction. 


3.2. Parsing. Computer science is intimately concerned with the relationship 
between syntax and semantics, particularly with parsing, which is the explication of 
the abstract structure of an expression such as 2/(4 + 3). This structure is often 
given as a tree 


/ 
\ 
+ (1) 
2 4 \, 


To parse an expression such as 2/(4 + 3) is to exhibit its structure. The first task a 
compiler for a computer language has is to parse the commands of the language, 
for only then can the commands be executed. 

Laborde [1990] notes that many students (these were mostly below the USA 
freshman level) see expressions such as 2/(4 + 3) merely as strings and are not 
really aware of their abstract structure. 


Recommendation. Introduce informal parsing of mathematicai expressions as a 
learning tool. 


3.3. Mathematics as syntax. Another point of view is that we should stop talking 
like Platonists and go along with the students’ desire for rules for manipulation. 
Some hold that mathematics is a game of syntax, and that to succeed in mathemat- 
ics you must master the rules of the game (more about this in Section 4.3). You 
need not hold that view, however, to realize that a lot of mathematics is accom- 
plished precisely by syntactic manipulation—what else is high school algebra? And 
even Platonists agree that you have to master the rules of the game. 


Recommendation. Make explicit the allowable syntax for statements about a type of 
object. 


For example, one can helpfully explain application of functions by adding the 
following sentence to the specification of function in Section 2.3: 


The expression “F(x )” is meaningful if and only if “x € dom F” is true, and 
in that case “F(x) € cod F”’ is true. 


Usually, this attitude is unexpressed. I am saying this out of my observations of students rather 
than what they actually say. 
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4. FORMAL TRANSFORMATIONS. Formal transformations, called rewrite rules 
in many contexts, are used in many different ways in computing. In general, they 
work this way: In an expression, you recognize a subexpression as matching the 
pattern of the left side of a transformation rule, and you rewrite the expression, 
replacing the subexpression by the right side of the rule. For example, in algebra 
you may rewrite a + bx + by as a + b(x + y). Computing an integral in freshman 
calculus can be thought of as recognizing patterns and applying formal transforma- 
tions, although there may be several possible transformations to apply and the 
process need not terminate. 

Sometimes rewrite rules are applicable to any occurrence of the suitable pattern 
(they are “context free’) and at other times they depend on specific conditions 
(“context sensitive”). A notorious example of the latter is L’H6pital’s Rule. 
Students resist constraints of this sort [Maurer, 1987]. 


4.1. Definitions as macros. Most implementations of the C language allow the 
user to define macros that a preprocessor converts into standard C commands. For 
example, you might want to limit the number of times a program will repeat some 
action. By writing #define maxit 20 you defined the macro maxit to have 
the value 20; the preprocessor will replace the word maxit with 20 everywhere it 
appears in the source program. Later, after you have debugged the program, you 
could change maxit to some much larger number and recompile. In general, in 
contrast to this example, macros can have parameters. 

Mathematical definitions play the role of macros in the context of proofs. A 
colleague of mine in computer science who majored in mathematics as an under- 
graduate has described how as a student he suddenly caught on that he could do at 
least B work in most math courses by merely rewriting the definitions of the terms 
involved in the questions and making a few obvious deductions. 


Recommendation. Encourage students to begin proving a theorem by replacing 
(some or all of ) the words that have definitions with the text of their definitions. 


4.2. Proof by rewriting. Gries [1991], Dijkstra and Scholten [1990] and others have 
urged that proofs be done by applying formal transformations. Many examples 
may be found in [van Gasteren, 1990] and [Gries and Schneider, 1993]. Proofs are 
not often done that way by mathematicians except in teaching formal logic. The 
preferred method is more often the “semantic” approach: the proof proceeds via 
an understanding of the objects involved rather than by the formal application of 
rules. Gries, and Dijkstra and Scholten, urge a syntactic approach: express the 
desired result in a formal language and apply meaning-preserving transformations 
to it until it becomes a consequence of a known fact. This turns a proof into a type 
of computation. 

I will now give two proofs of a simple statement about the ordering of the real 
numbers drawn from Gries [1991]. The statement to prove is 


(x>z) => ((x>y)V(y>zZ)). (2) 


4.2.1. Semantic proof. The way I proved (2) when I first saw it was to envision x 
and z as points on the line placed this way: 


OO  rO™”Or ror 
Z x 


There are three different regions into which we can place y. In the right two, 
y > z and in the left two, x > y. End of proof. 
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This proof is written in English, not in symbolic notation, and it refers to a 
particular mental representation of the structure in question (the usual ordering of 
the real numbers). 


4.2.2. Syntactic Proof. This proof is due to David Gries (private communication). 
It is based on these principles: 


P.1 (Contrapositive) The equivalence of P > Q and ;Q => -—P. 
P.2 (DeMorgan) The equivalence of =(P V Q) and =~PA -Q. 
P.3 The equivalence in any totally ordered set of =(x > y) and x < y. 


Proof: 


(x>z)>((x>y)V(y>zZ)) 
= byP.1 


a((x>y)V (y >z)) = a(x >z) 
= byP.2 
(a(x >y)A aAl(y >z)) = a(x > Zz) 
= by P.3 three times 


((x<y)A(y $z)) > (*z) 
which is true by the transitive law. 


4.2.3. About the syntactic proof. There are many advantages to the technique 
illustrated by the second proof. It holds in any totally ordered set, not just in the 
real numbers. Each instance of the application of a transformation can be 
mechanically checked to see that it is correctly applied. (Ingenuity, of course, is 
still required to create the proof.) This mechanical verifiability is certainly not true 
in the case of the usual mathematical proof; it is notorious that if you read a proof 
written by someone whose mental representation of the concepts is very different 
from yours, the proof is next to impossible to follow. 


4.2.4. About the semantic proof. There are several arguments for the semantic 
proof. For one thing, many mathematicians prefer the proofs to be written out in 
English sentences rather than in the symbolic notation of 4.2.2. This view is 
advocated in the works on mathematical writing by Halmos [1975] (page 42), 
Steenrod [1975] (page 57), Gillman [1987] (page 15) and Boas [1981]. Another point 
is that it is easy to make mistakes in checking the application of transformations, 
particularly when the patterns that must:match are complex. 

However, the major argument for semantic proofs concerns mental represen- 
tations. 


4.3. Mental representations. Several mathematicians who read the syntactic proof 
in 4.2.2 in an earlier version of this paper expressed dissatisfaction with it as 
compared to the pictorial proof preceding it. One objection they gave is that the 
pictorial proof helps them to understand the theorem. I believe that when they say 
that, they mean they want a mental representation of the object involved in the 
theorem that makes the truth of the theorem obvious or easy to understand.’ A 


71 do not claim that making the truth of the theorem obvious constitutes proof of the theorem. 
Argumentative philosophers of science who suspect mysticism in this paragraph should observe that I 
am making a checkable claim about the behavior of mathematicians, not a philosophical claim about 
Truth. 
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mental representation of a particular concept is an elaborate metaphor. If you can 
find the right metaphor for a mathematical object, you can follow proofs concern- 
ing the object much more easily and you can frequently avoid falling into concep- 
tual traps. (Not only that, it is the mental representation that suggests how the 
mathematical structure can be used in applications.) Following the proof line by 
line may convince one that the theorem is correct, but it gives no understanding 
unless the proof aligns in some sense with one’s internal representation of the 
concept.® Indeed, the hope is that it will refine or reform one’s inner representa- 
tion of the concept.’ 

The statements in the preceding paragraph about mental representation are 
controversial: they reflect my own position but not the position of all mathemati- 
cians. Those statements caused far more correspondence than any others in this 
article. Some said that they do not use mental representations. For them, mathe- 
matics is all syntax. Others were dismayed by the syntactic proof and felt that the 
primary purpose in teaching was to transmit to the students useful mental 
representations of the concepts. Thus there are two kinds of mathematicians: 
Those for whom the mental representation (they often say “intuition” or ‘“‘under- 
standing’) is paramount, and those who insist that syntax is primary. The gulf 
between these two kinds of mathematicians is vast. It is as if we were two different 
kinds of intelligent beings who are deluded into thinking we are communicating 
with each other. (But we do communicate.) 

It is not unreasonable to assume that some of our students tend one way and 
some the other. I am in the mental representation camp, but in recent years, I 
have used syntax and transformations of statements in class more than I used to, 
and I believe it makes a difference for the better to the students. 


4.4. Explicit use of logic. Even mathematicians in the mental representation camp 
use computation in proofs. Finding a suitable representation of an object that 
allows one to compute is as old as mathematics. However, most mathematicians 
rarely compute explicitly with the rules of logic as exemplified in 4.2.2. I believe 
that mathematicians should be aware of the possibilities of this approach, in 
teaching if not in their own research. 


Recommendation. Transmit your mental representation of concepts whenever you 
can, but also give proofs as explicit logical calculations when appropriate, because 
that provides the student with a second way to deal with the problem and provides 
him or her with the tools to carry out similar proofs. 


Related to explicit mention of logical concepts is the idea of giving a rule of 
inference for particular types of objects. For example, setbuilder notation has the 
following rules of inference: From the statement a © {x|P(x)} (where P(x) is a 
predicate) one can deduce P(a), and from the truth of P(a) one can deduce 


8Some readers commented that after reading the syntactic proof in 4.2.2 they suddenly understood 
that in a totally ordered set the condition to be proved is merely the contrapositive of transitivity. So a 
syntactic type of proof can be illuminating, too. 

’There is a sizeable research literature on the subject of mathematicians’ and students’ mental 
representations of mathematics. See the articles in [Schoenfeld, 1987a], particularly [Schoenfeld, 1987b] 
and [Maurer, 1987], as well as [Harel and Dubinsky, 1992], [Miller, 1987] and the discussion starting in 
the second column of page 1187 of [Devlin, 1992]. 
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a & {x|P(x)}. My students have found this explicit mention of allowable inference 
to be helpful. 


Recommendation. Give explicit rules of inference for concepts when they are intro- 
duced. 


Note that this was done in the specification for ordered pairs in Section 2.3. 

Lamport [1993] provides a detailed model for presenting proofs in a structured 
way that have the potential for clarifying proofs in either style, symbolic or based 
on mental representations. 


5. TYPES AND POLYMORPHISM. In Pascal and in other typed programming 
languages, if you declare a variable to be of type Boolean and then try to set it 
equal to 3 the compiler gives you an error message. This is an example of 
mismatched types. 


5.1. The multiplicity of types. The further students go in mathematics, the more 
different types of data they have to deal with. The typical second or third semester 
calculus course introduces two and three dimensional vectors, matrices, functions 
F: RXRXR-—-R (C‘scalar fields”), functions F: R->RxXRxXR (“paths in 
space’) and functions F: RxRXR—-RXRXR “vector fields’) to know 
about, as well as all the objects of single-variable calculus. At some point we cease 
to be able to distinguish all these different things by different letters and typefaces, 
and the students have to learn to understand the types of the expressions they see 
by reading the surrounding text. 

More than that, the meanings of the operator symbols in the formulas at that 
level may depend on the types of the operands. Consider the ‘“ x” symbol. In the 
expression 3 X 5 it denotes numerical multiplication. If A and B are three 
dimensional vectors, A x B denotes the vector product, but if they are sets, it 
denotes the cartesian product. In computer science terms, the ‘‘x” symbol is 
“polymorphic”, in the sense that its meaning is dependent on the types of its 
arguments. 


Recommendation. Use the concepts of type and polymorphism explicitly to help 
students to understand and avoid the traps of type confusion. 


Most students now have had some experience with programming languages that 
use typing. I have found that to refer to types explicitly is helpful. I write “TYPE 
ERROR” on their homework when such a mistake is made, and sometimes when I 
forget to put the little arrow over a symbol for a vector on the blackboard, I get a 
chorus of ““TYPE ERROR” from the class, which I think is great. 


5.2. Teaching conceptual distinctions. A particularly bad typing error concerns 
functions. A function F: A — B has a value F(a) at each element of A and, 
particularly in undergraduate mathematics, it may be given by an expression that is 
used to compute its values. The function, its value at an input, and an algorithm 
for computing it are three different mathematical objects that must be kept 
distinct.!° Students often learn to cope with this in calculus courses by gaining an 


10Our mathematical ancestors confused these, too [Selden and Selden, 1992], [Sfard, 1992]. 
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implicit understanding of the differences. Because the distinctions are not explicit, 
the students’ understanding is not on firm intellectual ground. 

For the most part, we do not try very hard to convince our students that a 
function is not the same thing as its defining expression or its values. I have pushed 
that point in some courses I teach and it helps. If you really want them to know it, 
of course, you have to test them on it. Far too many mathematicians are unwilling 
to try testing first and second year students on conceptual content of this sort 
because they feel that most students will fail the questions. In fact, students can be 
taught conceptual distinctions if the teacher starts slowly and asks very simple 
questions at first. All you have to be willing to do is give up about 20% of the 
“content” of the course. I believe the gain far outweighs the loss, in courses for 
non-math-majors and math majors alike. 


Recommendation. Expect conceptual understanding at the appropriate level from all 
students in any course, and test them on it. 


6. SELF-MONITORING 


6.1. Name your behavior. The New Hacker’s Dictionary [Raymond, 1991] is a 
compilation of computer jargon which has to be one of the most enjoyable 
dictionaries ever composed. One thing that becomes noticeable if you read it 
straight through is the number of words and phrases hackers!! have invented to 
describe their own mental states or behavior while working. This is discussed in the 
introduction to the Dictionary. “Juggling eggs’’, for example, refers to the necessity 
of keeping a lot of details in your head while modifying a program—with the 
consequence that an interruption can cause you to scramble the program (this is a 
paraphrase of the book’s definition). 

Of course, that is a phenomenon familiar to research mathematicians. You can’t 
spend short, separated pieces of time trying to understand a complicated mathe- 
matical phenomenon; you need the time to concentrate and to get it all in your 
head at once—to be in “hack mode” (p. 190 of [Raymond, 1991]). The point is, 
mathematicians don’t have a name for this, as far as I know. Computer hackers do. 
We should emulate them. 

Computer people give names to their own counterproductive behavior quite 
freely—look up “creationism”, “kluge’, “mung” and “thrash” in [Raymond, 
1991]. (I have personally munged several chapters of my class notes and had to tear 
them up and start over.) It would be particularly helpful to give names to common 
mistakes made by students in math courses. Pdlya [1948] emphasized the impor- 
tance of introducing notation for the quantities in a problem. It is equally 
important to name behaviors, both useful and harmful, that occur in problem 
solving. 


Recommendation. Describe and name the common kinds of mistakes students 
make. 


If there is a memorable name for such mistakes the student will be more likely 
(and will find it easier) to monitor his or her behavior. Self-monitoring is widely 


114 hacker is someone who programs for the sake of programming—although useful tools may 
result, they are not the primary motivation—and who enjoys learning and using the obscure features 
and behavior of various operating systems and programming languages. The latter-day meaning of 
someone who breaks into private systems is not intended here. 
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cited in the educational literature as one of the properties that distinguish good 
students from poor ones. See [Resnick, 1987], pages 25-27, and [Schoenfeld, 
1987c]. 

One typical mistake occurs when using concepts that by definition require the 
existence of something. For example, for integers m and n, m divides n if there is 
an integer g such that n = qm. When faced with proving that if m divides both n 
and p, then m divides n + p, acommon mistake is to write down the assumptions 
as n = qm and p = qm, using the same gq. Recently in class I exclaimed, ‘“‘If Bob 
and Ray are both married, that doesn’t mean they have the same wife!” If I had 
said this on the network, such behavior might have become known as “existential 
bigamy” (or some such phrase). 

It would be useful to come up with punchy names for good behaviors such as 
the following: 


e Working examples before attacking the general case of a problem. 

e Naming all the variables in a problem. 

e« Checking special cases of a statement to see if it is consistent with the rest of 
mathematics. (This is sometimes called a “sanity check’’.) 


We should also name destructive behaviors such as these: 


e« Forgetting to check trivial cases. Hackers have an analogous error they call 
a “fencepost error” —getting the bounds wrong in a loop is an example. 

e Proving an implication backward—in other words, being asked to prove 
P => Q and coming up with a proof of O = P. This is distressingly common 
among students whom I teach mathematical reasoning. 

¢ Reading variable names as labels ([Nesher and Kilpatrick, 1990], pages 
101-102) so that a statement such as “There are six times as many students 
as professors” gets translated as 6s = p instead of 6p = s (where p and s 
have the obvious meanings). 


6.2. Context. People who have grown up together or who have worked in the same 
place for a long time have what have been called ‘“high-context” conversations 
with many elliptical references to shared ideas, opinions and experiences. A group 
of people from different cultures or who live in a large city and don’t know each 
other well will have “low-context” conversations with more made explicit and more 
attempt to avoid the assumption that the others share one’s point of view. 

Mathematicians seem to avoid connotative, high-context conversation about 
doing mathematics even though the potential is there for communicating quite 
complex ideas about the subject and about one’s behavior while doing it. It is clear 
from the New Hacker’s Dictionary that hackers do have high-context communica- 
tion about these things. Mathematicians have been trained explicitly and through 
bitter experience that connotations can mislead when doing a proof. Perhaps many 
of us have mistakenly applied this lesson to other areas of our life, insisting on 
low-context conversation even when high-context conversation is possible. Or 
perhaps bright people who are not particularly talented at picking up social context 
are attracted to mathematics. 

I don’t know how we can change the mathematical culture to encourage 
high-context interaction. Perhaps the spread of email will help; linguists have 
discovered that linguistic innovation spreads much more rapidly in a language 
spoken by a large number of people in contact with each other than it does in 
small groups. 
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Setting out on his career: Who was he? 
(see page 426.) 
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Order and Chaos on Your Desk 


Susan Bassein 


This paper describes a physical system which is easy to build—and can fit in a 
small clearing on your desk—and whose dynamics can be easily varied to demon- 
strate some fundamental concepts of dynamical systems. And, because its dynam- 
ics take place in one dimension, they are simple to analyze: although the restric- 
tion to one dimension excludes some of the phenomena which are responsible for 
much of the current interest in dynamical systems [1, 4], it allows one to draw 
pictures which illuminate some of the central ideas of the subject [1, 2, 3]. On the 
other hand, examples of chaotic systems (e.g., [1, 4]) are typically either difficult to 
realize in practice or result in dynamics in dimension 2 or higher. For example, the 
complicated (albeit fascinating!) dynamics of a periodically, externally forced, 
damped pendulum in S$! x R are described in [1, 4]. 

To understand what the challenge of designing a system with one-dimensional 
dynamics entails, let us see why the dynamics of the forced pendulum take place in 
S' x R. As shown in Figure 1, the state of the pendulum system can be described 
by three parameters: the position 6 of the pendulum (in S'), the (angular) velocity 
d@/dt of the pendulum (in R), and the phase of the oscillation of the external 
force applied at that time (in S'). If we take periodic “snapshots” of the system at 
the moments that the oscillation of the external force passes through some given, 
fixed phase, then the state of the system at those moments can be described by the 
remaining two parameters. Since the state in the next snapshot is a function of the 
state in the current snapshot, we obtain a function from S' X R to S' x R whose 
iteration describes the sequence of states observable in the sequence of snapshots. 
For a system’s dynamics to take place in R instead, its state in each snapshot must 
be described completely by a single parameter. 


) force 


Figure 1. A forced pendulum 


My original design was a water tank regulator (which I never built), but because 
electricity is a much more tractable fluid in practice (and electrons are much less 
of a nuisance when they slosh all over your desk), I replaced the flow of water with 
the flow of electrons. An intermediate design, which I built with the help of 
engineers Chuck Iverson and Skip Korhonen, combines electronic and mechanical 
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components and can be made to produce either a rhythmic or delightfully chaotic 
noise. But the version that works most reliably and is simplest to build, control, 
and model is an electronic approximate simulation of those other designs. The 
electro-mechanical and electronic systems are pictured in Figure 2. 


Figure 2. The electro-mechanical and electronic systems 


While the physical implementation in each of the three designs differs, the basic 
components, which are listed in the following table, and the organization of each 
system, which is illustrated in Figure 3, are the same; the precise specification of 
the electronic design appears in the Appendix. 


Figure 3. Basic system design 
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Component Water flow Electro-mechanical Electronic 


Storage Water tank Capacitor Capacitor 

Inflow Pipe + 9 volt supply + 9 volt supply 

Outflow Pipe Ground Ground 

Sensor Nozzle and Solenoid and Capacitor and 
mechanical arm mechanical arm voltage comparator 

Flow selector Valves Mechanical switches Electronic switches 


The operation of the system, illustrated in Figure 4 for the electronic design, is 
as follows. Let x = x(t) be the quantity (of water or charge) in the storage device 
at time t. The state of the sensor at time ¢ is described by a single, non-negative 
real number parameter s = s(t) (the angular displacement of a mechanical arm or 
the charge stored in an auxiliary capacitor). In its rest state s = sy at some time fp, 
the sensor measures x(t,) and then s rises to a maximum value which depends on 
x(t)) and then decreases to return to sy at a time ¢ = ¢,, which also depends on 
x(t,). There is a threshold value s, of s such that when s <s,,, the storage device 
is connected to the inflow and when s > s,, the storage device discharges through 
the outflow. 


Figure 4. Operation of the sensor in the electronic system 


The net result of this arrangement is that if x(¢,) is below some threshold value d 
(which corresponds to s,), then the voltage x(t,) at the next sensing will be greater 
than x(t,), but the more x(t,) is above d, the greater will be the amount that x(t,) 
will be less than x(t,)). Let F be the function which gives x(t,) = F(x(t))); in the 
Appendix we show that for the electronic design, if R is the outflow resistance to 
ground, then F is given approximately by the formula 


4.03 0.392 
9-0 -x)(- + saa | 


6.46 2x10°/R\ / 4 93 \ 9-392 
9-9-2 (S| if4.35<x <9 


if 2.11 <x < 4.35 
F(x) = 


x + 2.11 6.46 


which we will use throughout the remainder of this paper. (If 0 < x < 2.11, then 
the sensor waits until x rises to 2.11 and then resumes cycling.) Here the discharge 
threshold is d = 4.35. The graph of F with R = 20,000 (ohms) appears in 
Figure 5. 
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x(t,) 
9 


x(t.) 
3 6 9 ° 


Figure 5. x(t,) as a function of x(t,) and the path of x 


Since the value of x at the moment s returns to s, for a sensing determines the 
value of x at the next sensing, the sequence of those values of x (in R) forms the 
dynamics of the system. (In practice, we use the maximum value reached by 
the sensor after each sensing to monitor the state of the system.) In terms of F, the 
succession of those states starting, say, at ¢ = 0, will form the sequence of iterates 
F"(x(0)) for n > 0, where “F”” means the composition of n copies of F. Figure 5 
shows the standard illustration of how each step of the iteration follows the graph 
of F by moving from a point (x(t,), x(t,)) through (x(t), x(t,)) = (x(t), F(x(ty))) 
to (x(t,), x(t,)). 

We can vary the dynamics of the system by controlling the rate of outflow (by 
adjusting the outflow valve or by varying the resistance R between the storage 
device and ground): as illustrated in Figure 6 and proved below, a slow outflow 
makes the sequence of values of x at the moments of sensing approach a fixed 
point, a moderate outflow produces stable periodic behavior, and a fast outflow 
can result in chaos. The standard analysis [1, 2,3] shows that the “attracting fixed 
point” illustrated in the left picture in Figure 6 results from |F’(x)| < 1 at the 
fixed point; a simple computation shows that this will happen for R > 55,900. 
(Note that in practice, resistances can only be specified to an accuracy of 5 or 
10%.) The “‘attracting period 2 orbit” illustrated in the middle picture results from 
|F’(x)| > 1 at the fixed point (which makes it “repelling”) and a pair of points on 
the graph, positioned symmetrically across the diagonal, at which (F*)(x)| < 1; 
this will happen for 52,400 < R < 55,800. Attracting orbits of period 4 (or period 
3, or higher periods, if you have a delicate touch on the control) can also be 


Figure 6. Dynamics for R = 61,000, R = 53,000, and R = 20,000 
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obtained from the electronic system from still lower values of R. And we show 
below that for 13,500 < R < 30,000, the system exhibits chaotic dynamics on an 
interval of values for x; in particular, for this range of values of R, the system does 
not have any attracting periodic orbits which would make the observable dynamics 
periodic even in the presence of theoretical chaos, which would be hidden in a 
Cantor set of x values. (In fact, it is possible, with a less simplistic analysis than is 
presented below, to prove that the system will behave chaotically on an even wider 
interval of R values, but the extra work would not be justified by the accuracy of 
the model, so we omit it; see [2, 3] for a deeper and more general view of 
one-dimensional dynamical systems.) 
We recall the definition of chaos given in [1] and [3]: 


Definition. A map F from a metric space M to itself is chaotic on M if 


1. F has sensitive dependence on initial conditions: there exists a 6 > 0 such 
that for any x € M and any neighborhood U of x, there exists a y € U and 
an n > O such that |F"(x) — F”"(y)| > 6; 

2. F is topologically transitive: for any pair U,V of open sets, there is an n > 0 
such that F"(U) Nn V # @; and 

3. Periodic points of F are dense in M. 


Notation. Let d = 4.35, ay = 2.11, b = F(d) = 5.1352..., and a = F(b); note 
that b is independent of R. Thus, the domain of F is [a,,9] > [do, b]. Figure 7 
illustrates the relationships between each of these quantities. 


a9 a da b 9 


Figure 7. Important points on the graph of F 


Remarks. We make a few observations that will be used in the proofs below. First 
note that for x € [a,, d], the value of F(x) does not depend on R and for every 
fixed x € (d,b], the value of F(x) is an increasing function of R because 
6.464 x + 2.11) is less than 1. We compute F(a,) = 2.23330 +++ > ao. Only slightly 
tedious derivative computations verify that for all R > 0, the function F is 
increasing on [a,,d] and concave down on (ay, d) and decreasing on [d, b] and 
concave up on (d, b); thus b is an absolute maximum for F and a < b. Another 
computation shows that the left derivative of F at x =d is greater than 1.06 
(independent of R), hence F’(x) > 1.06 on all of (ap, d). 
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Proposition 1. Jf R > 11,600, then for all x € ay, b] there exists an N = 0 such 
that F"(x) € [a,b] for alln > N. 


Proof: A computation shows that the condition R > 11,600 guarantees that F(b) 
> da, so that F(a), b]) Clay, b] and therefore we may iterate F on [dp, b]. First 
suppose x € [dy, d): it follows from the remarks above that F(x) — x > F(a,) — 
a, > 0.12 so there is an N > 0 such that F(x) € [d, b]. If x € [d, b] instead, let 
N = 0, so in all cases we have F(x) €[d,b]. Then, since F is decreasing on 
[d, b], it follows that F’+!(x) € [F(b), F(d)] = [a, b]. Finally, we have F({a, b]) 
C [a, b]: if x € [a, d) (in the case a < d), then F(x) € [F(a), F(d)) ¢ [a, b], else 
if x € [d, b], we also have F(x) € [a, b] by the previous reasoning. QED 


Proposition 2. Jf 13,500 < R < 30,000, then for every non-empty open interval 
I cl[a, b], there is ann > 0 such that F"(J) = [a, 5]. 


Proof: The condition R < 30,000 guarantees that a < d and that F(a) < d, from 
which it follows that there are two numbers, d, € [a, d) and d, € (d, b], such that 
F(d,) = F(d,) = d. For an interval J, let |J| denote its length. We prove the 
proposition by showing that if any two of d, d,, and d, are in I, then F*(J) = [a, 5], 
otherwise either |F(7)| > 1.06|J| or |F7(1)| > 1.008|J|, so iterates of J will grow in 
length until the first condition is satisfied. 

Suppose two of d, d,, and d, are in I. Since d, < d < d,, if d, and d, are in J, 
then so is d, so in any case, d € I. We use the fact that F is increasing on [a, d] 
and decreasing on [d, b]: we have either J > [d,, d] or I > [d, d,]; in the first case 
we have F(J) >[F(d,), F(d)] =[d,b] and in the second we have F(J) > 
[F(d,), F(d)] = [d, b] also, so F7(1) > [F(b), F(d)] = [a, b], as required. 

Now suppose that no two of d, d,, or d, are in J. If d €J, then either 
I c[a,d] or I c[d,b]. In the first case, since F’(x) > 1.06 on I, we have 
|F(Z)| > 1.06|7|. In the second case, on I we have |F’(x)| = |F’(b)| = |F’(F(d)); a 
derivative computation shows that F’(F(d)) as a function of R is decreasing to the 
left of and increasing to the right of an absolute minimum near R = 19,747 and 
values less than — 1.44 at R = 13,500 and 30,000, respectively. Thus, |F’(x)| > 1.44 
on J and |F(J)| > 1.44|J| > 1.06[ Z|. 

Finally, suppose d € J but neither d, nor d, is. Let I = (d — x, d+ y). Then 
by the same reasoning as above, F expands the interval (d — x, d) to at least 
length 1.06x. We claim that F expands the interval (d, d + y) to at least length 
2.07y. The slope of the line between the points (d, F(d)) and (d + y, F(d + y)) is 
more negative than the slope of the line between (d, F(d)) and (b, F(b)), which is 
(F(b) — F(d))Ab — d). This expression is an increasing function of R because the 
only part of it that depends on R is F(b). For R = 30,000, that slope is less than 
— 2.07, which proves the claim. It follows that F multiplies the length of J by at 
least a factor of 


max{1.06x, 2.07y} 1.06 - 2.07 
—_— > ——_$—$—$ 
x+y ~ 2,07 + 1.06 


Since neither d, nor d, is in J and F(d,) = F(d,) = d, it follows that F(Z) lies 
above d, so F(I) c [d, F(d)]. By the reasoning of the previous paragraph, we have 
|F2(1)| > 1.44|FC)| > 1.44 - 0.7|7| > 1.008/7|, as required. QED 


> 0.7. 


Corollary. Jf 13,500 < R < 30,000, then F is chaotic on [a, b] and has no attracting 
periodic orbits. 
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Proof: To prove that F has sensitive dependence on initial conditions, we let 
0 <6 < (b -a)/2: if J is an open interval with x € J C[a, b], we have F”(J) = 
La, b] for some n > 0, so choose a y € I which F” maps to whichever of a or b is 
further from F”(x). F is topologically transitive because every non-empty open 
interval eventually maps to all of [a, b]. Every non-empty open interval J c [a, b] 
contains a periodic point because if F”(/) = [a, b], then F” has a fixed point in J, 
hence there is a periodic point in J whose period is no more than n. Finally, if 
there were an attracting periodic orbit, there would be an interval around each of 
its points which was contracted toward the orbit points by repeated application of 
F; Proposition 2 says this cannot happen. QED 


APPENDIX: THE ELECTRONIC DESIGN. As illustrated in Figure 8, the elec- 
tronic design consists of a collection of resistors and capacitors, a potentiometer to 
control the outflow resistance, a 9 volt battery, and four CMOS integrated circuits: 
two “op amps” (operational amplifiers, 3140) and two transistor-pair chips (4007) 
which are wired to make single-pole double-throw switches as described in [5]. 
Figure 2 showed the arrangement on a breadboard, with a meter (microamperes), 
an additional op amp (3140) to provide a high-impedance input to the meter, and a 
1000 ywf capacitor across the battery to protect the circuit when the battery is 
connected or disconnected. If a breadboard and a high-impedance input meter (or 
an oscilloscope) can be obtained from an electrical engineering, computer science, 
or physics department, the rest of the parts (battery not included) should cost at 
most $15. 


+9 
680 K 
200 K 
Vout 
Figure 8. Circuit diagram 
R, = 510K C, = 10uf S, = S, = 4007 
11K <R<61K C, = 0.1yf A, = A, = 3140 


Briefly, the circuit works as follows to produce the behavior illustrated in Figure 
4 and the formula given above for F. Recall that if a capacitor with capacitance C 
and initial voltage V, is connected to an external voltage V, through a resistance 
R, its voltage V at time ¢ will be given by the RC-formula: V = V, — (V, - 
V,)e~'/R©, In this description, voltages are given in volts, capacitances in wf, and 
for resistances ““K” means 1,000 ohms and ““M” means 10° ohms. 

The op amp A, is wired as shown to provide hysteresis [5]: its output snaps to 
positive voltage when the voltage on capacitor C, falls below ~ 2.015 and snaps to 
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ground when the voltage on C, rises above ~ 2.11. Sensing begins when the 
former happens: switch S$, connects the sensor part of the circuit to storage 
(capacitor C,) and nearly instantaneously charges C, to approximately the same 
voltage x(t,) as is stored in C,. This causes the voltage in C, to rise past ~ 2.11 
and flip S$, to disconnect the sensor from C,. Then C, and C,, with combined 
capacitance 0.2, split their combined voltage x(t,) + 2.11 and then discharge 
through resistor R, to return to sense C, again: by the RC-formula, the voltage on 
C, at time ¢ is (x(t,) + 2.11/2)e7~%~ 10/099) and this reaches 2.015 when 
t=t, + 2In@( x(t) + 2.11)A.03). 

Op amp 4A, controls the flow to and from C;: if the voltage on C, exceeds 
S, ~ 3.23, the output of A, becomes positive and flips switch S$, to discharge C, 
through the potentiometer R, otherwise C, charges through R,. Thus, if x(t.) > 2 
x 3.23 — 2.11 = 4.35, capacitor C, discharges for the length of time 2 In(x(t,) + 
2.11A2 - 3.23)) and then charges for the remaining time of the cycle, otherwise C, 
charges for the entire 2 In((x(t,) + 2.11)/4.03). The formula for F follows directly 
from this and the RC-formula. 
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On the Geometry of Halley’s Method 


T. R. Scavo and J. B. Thoo 


According to Traub [Tra64], Halley’s iteration function (.F.) “must share with the 
secant I.F. the distinction of being the most frequently rediscovered I.F. in the 
literature.” Halley’s method is a close relative of Newton’s method, an iterative 
technique depicted as a sequence of tangent lines with zeros converging to a root 
of a function. The usual derivation of Halley’s method, however, lacks any obvious 
geometric interpretation. We present a derivation of Halley’s method having such 
an interpretation, and give a brief history of Halley’s work and the method that 
bears his name. 


1. HISTORICAL BACKGROUND. Edmond Halley (1656-1742), well-known as- 
tronomer and mathematician, was impressed by the work of ‘‘an ingenious profes- 
sor of mathematics,” Thomas Fautet de Lagny, who, in a book published in Paris 
in 1692,' presented some formulas for “extracting roots of pure powers, especially 
the cubic.”” Halley sought to understand the origin of these formulas, and in the 
process came to generalize them. 


3 
The result of de Lagny that impressed Halley? is that Va? + b lies between 


ab 4 a a’ b 1 
+ —— ~+\y—+— 

O30 4b 2 4” 3a ) 
for a> >> b> 0. Halley called these the rational formula and the irrational 
formula, respectively [Hal1694]. Each is a special case of more general iteration 
functions derived in Section 3.° 

It is ironic that Halley preferred the irrational formula over the rational 
formula, for it is the latter that bears his name. Indeed, virtually all of Halley’s 
calculations employed the irrational formula, of which he wrote [Hal1694] 


And this formula is deservedly preferred before the rational one, which, on account of its large 
divisor, cannot be used without much trouble, in comparison of the irrational one, as manifold 
experience has informed me. 


Apparently, extracting roots was relatively easy for Halley who claimed, for 
example, to have calculated eighteen significant digits of the cube root of 231 “in 
an hour’s time” using the irrational formula. 

Another reason Halley preferred the irrational formula was his belief that it 
generally gives better approximations than the rational formula. Speaking of the 


See [Bat38] for a complete reference. 

*de Lagny also gave a fifth-order formula that Halley found even more impressive. 

>The formulas in (1) may be obtained by setting f(x) := x? — (a> + b) in Equations (11) and (8), 
respectively, and evaluating at x := a. 
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methods in (1) he said 


And between these two limits always lies the true root, being rather nearer to the irrational than 
to the rational formula... . 


While true of each example given in [Hal1694], this is not true in general, however. 
A counterexample is provided in Section 4. 

Halley also admired the work of the 16th century French mathematician 
Francois Viéte, popularizer of what later became known as “Horner’s method,” an 
approximation technique pioneered by several Chinese mathematicians in the 13th 
century [Boy68]. Viéte’s method, as it is sometimes called, is a linearly converging 
algorithm akin to bisection and may be applied to any polynomial with at least one 
real root [Ypm93]. It may also be used to produce starting values for Newton’s 
method and other higher-order iterative procedures, something that Halley himself 
might have done. More importantly, it appears that Viéte’s method was an 
important precursor of Halley’s method, and of root-finding methods in general. 

Although Halley was almost certainly aware of the fledgling calculus when he 
wrote his paper in 1694,* he apparently did not realize that his method involved 
derivatives or fluxions, as he would have called them. In hindsight, this is not 
surprising considering it was Simpson, in 1740, who first realized the connection 
between derivatives and Newton’s method [Kol92]. What is surprising is that 
Brook Taylor recognized the derivatives in Halley’s method as early as 1712 
[Fei85]: 


[Taylor] noticed what Halley had failed to realize before him: that the coefficients in [Halley’s 
examples] are directly related to the successive derivatives of the original polynomial... . 


Moreover, applying Halley’s techniques to Kepler’s problem—an outstanding 
problem in astronomy with which Halley was no doubt familiar—led Taylor to a 
remarkable discovery [Bai89, Ypm93]. In a letter to Machin in 1712, Taylor 
proclaimed [Fei85] 


While I was thinking of these things, I fell into a general method of applying Dr. Halley’s 
Extraction of roots to all Problems... And it is comprehended in this Theorem... . 


which turns out to be Taylor’s Theorem! 

The reason that Kepler’s problem went unsolved for so long is that it involves a 
transcendental equation. In a superb summary of Halley’s work, Bateman [Bat38] 
suggests that Halley might have preceded Taylor in the discovery of Taylor’s 
formula had he only “applied his methods in a general way to transcendental 
equations.” While not important in and of itself—after all, Gregory knew of 
“Taylor’s theorem’ around 1668— it is noteworthy that it was Halley’s method that 
prompted these developments, whereas Newton’s method languished in ignorance 
until the time of Simpson. 

Despite Taylor’s achievements, he was unable to provide a general formula for 
Halley’s method. It remained for Schréder [Sch1870], more than one-and-a-half 
centuries later, to derive Halley’s iteration function as we now know it. But 


“Newton published his Principia in 1687, but “only after intense coaxing” by Halley [Boy68]. In fact, 
the well-to-do Halley had the Principia published at his own expense. 
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Schroder made no reference to Halley. Indeed, Schroder was primarily interested 
in higher-order iteration functions and mentioned Halley’s formula almost in 
passing. 

Kobald [Kob1891] derived Halley’s formula in a brief paper published in 1891, 
but unfortunately his derivation is unclear. Frame [Fra44], on the other hand, was 
the first to derive Halley’s iteration function via a second-degree Taylor polyno- 
mial (see Section 3 and also [Wal48, Ste51, Gan85, Bai89]). Some textbooks also 
employ this method (e.g., [Mcc67]), while others simply mention it. 

Some authors have used determinants and Cramer’s rule to derive Halley’s 
formula and other higher-order iteration functions (see [Ham50, Ste51, Kis54)). 
Other derivations provided by Frame [Fra53] and Traub [Tra61b] used continued 
fractions and Padé approximants, respectively, while Snyder [Sny55] employed a 
technique he called the method of replacement. Finally, Salehov [Sal52] intro- 
duced the method of tangent hyperbolas discussed in Section 4. 


2. PRELIMINARIES. Given a function F: X — X and a point x, © X, one may 
iterate F to generate the sequence of points x), x, = F(%), x, = F(x,), and so 
forth. The sequence thus obtained, 


Xno, =F(x,) forn =0,1,2,..., (2) 


is called the orbit of x, under iteration of F. A point a is called a fixed point of F 
if F(a) = a. When X is a subset of the real numbers, the graph of F intersects 
the line y = x at each fixed point (see Figure 1). A fixed point a is said to be 
attracting if there exists a neighborhood U of a such that the orbit of every point 
xX, © U converges to a under iteration of F. Finally, if |F’(a)| < 1, then a@ is an 
attracting fixed point, an important result known over a century ago [Sch1870]. 


Figure 1. An orbit converging to a fixed point. 


Now consider the problem of finding a root a of the equation 


f(x) = 9. (3) 
We assume throughout that the root in question is simple, that is, f’(a) # 0. One 
way to approximate a is to find another function F, called an iteration function 
(I.F.) for f, for which a is an attracting fixed point. Then, for a suitably chosen 
initial value x,, the iteration (2) converges to a. Note that the choice of I.F. is not 
unique (e.g., [Bur89, page 42]). 
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One well-known iterative root-finding method is Newton’s method, 


f(*n) 
x =x, - 4 
n+1 n f'(x,) ’ ( ) 
a special case of (2) with F(x) = x — f(x)/f'(x). Evidently @ is a fixed point of F 
if a is a simple root of (3); furthermore, this fixed point is attracting (see below). 
We call F the Newton I.F. for f, and denote it by N-. 
To derive (4), we approximate the given function f at x =x, by a linear 
function y of the form 


y(x) =a(x —-x,) +5. 
Then the requirement that both f and y, and their first derivatives, agree at 
x =x, leads to 


y(x) =f (4p) (% — Xn) + F(%,)- (5) 
Finally, solving y(x,,,) = 0 for x,,, yields (4). Since (5) is the equation of the 
line tangent to f at x =x,, it is clear that Newton’s method applied to f may be 


interpreted as a sequence of tangent lines with zeros converging to a root of the 
function. (See Figure 2.) 


Figure 2. A geometric interpretation of Newton’s method. 


Newton’s method is a quadratically converging root-finding algorithm. Loosely 
speaking, this means that the number of. significant digits eventually doubles with 
each iteration. Such a method gives rise to a second-order algorithm. It can be 
shown that the first derivative of a second-order I.F. vanishes at the corresponding 
fixed point. In the case of Newton’s I.F., the first derivative is 


f(x) f"(*) 
17 \2 
f'(*x) 
which clearly vanishes at a simple root a. Hence a is an attracting fixed point. And 
since N;(a) = f"(a)/f'(a) is nonzero in general, there exists a neighborhood for 
which (4) converges quadratically to a. 


Whereas Newton’s method is second order, we show in Section 3 that Halley’s 
method, 


9 


Nr(x) = 


_ 2f(x,) f' (Xp) 
"Fn — Fen) Fen) 


Xn4+1 
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is a third-order algorithm. Such an algorithm converges cubically insofar as the 
number of significant digits eventually triples with each iteration. And not only 
does the first derivative of a third-order I.F. vanish at a fixed point, but so does the 
second derivative. 


3. HALLEY’S METHOD. In Section 2 we derived Newton’s method using a linear 
function y, the first-degree Taylor polynomial of f at x,. Let’s see what happens if 
we instead use a second-degree Taylor polynomial, 


(x) =F) $F Cada — ay) + SC — 2) 


where x, is again an approximate root of f(x) = 0. As with Newton’s method, the 
goal is to determine a point x, ,, where the graph of y intersects the x-axis, that 
is, to solve the equation 


0 = f(%,) +f (Xn) (Xn 41 —X,) + CO en —x,)° (6) 


for x,,;- Following Frame [Fra44] and others, we factor x, ,, — x, from the last 
two terms of (6) to obtain 


0=f(x,) + (Xn41 — Xn) f'(%n) + f OO yg oy) ? 
from which it follows that 
; f(x) 
Xn+1 %n = f"(x,) . (9) 


f'n) + 


Approximating the difference x,,, — x, remaining on the right-hand side of (9) 
by Newton’s correction —f(x,,)/f’(x,,) given in (4), we obtain 
2f( xn) A (rn) 
Xnet =Xn - De (10) 
2f'( Xn) —F(4n) I On) 
widely known as Halley’s method [Bat38, Ste51, Fra53, Kis54, Sny55, Tra61b, 
Tra64, Dav75, Bro77, Han77, Pop80, Ale81, Gan85]. 

Unfortunately, the preceding derivation lacks any clear geometric interpretation 
analogous to the tangent lines of Newton’s method. What we seek is an osculating 
curve to f at x, (that is, a curve agreeing with f at x, up through second 
derivative) that interpolates the points (x,, f(x,,)) and (x,,,,0) where x,,, is 


(Xn41 7 X,) 


One might be tempted to apply the quadratic formula to (6), obtaining 


f'n) t VE On) = 2F Cn) FO) 


Xn+i An f" (Xn) (7) 
A judicious choice of sign in (7) (see [Tra64, Gor90]) leads to the LF. 
1 V1 - 2f(2) f"(2)/f' (2 
C(x) - = (8) 


f"(x)/F(X) ) 


a general form of Halley’s irrational formula [Hal1694, Bat38, Gan85]. But neither (7) nor (8) is what is 
known as Halley’s method. We remark that rationalizing the numerator of (7) yields a special case of 
Laguerre’s method [Ost73, Han77] sometimes attributed to Cauchy [Tra64, Pop80]. 
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given in (10). If the curve crosses the x-axis but once, so much the better. This 
brings us to the so-called “method of tangent hyperbolas,” but first we make a few 
remarks concerning Halley’s method. 

Denote the Halley I.F. for f by 


2F(X)F'() an 
2f'(x)’ — f(x) f(x) 

If a is a simple zero of f, then we see immediately that H,(a@) = a. Further, a 
straightforward calculation shows that H;(@) = H7(a@) = Owhile H;’(a) # 0. Thus, 


Halley’s I.F. is third order for simple roots. In fact, a direct computation shows 
that 


H,(x) =x - 


Pa) Ages ) = —Sf(a@) 


H"(a) = —- 
ro) | F(a) 2\ F(a) 
where Sf(x) denotes the Schwarzian derivative of f at x, a most curious result.° 


Bateman [Bat38] was the first to point out that Halley’s method may be 
obtained by applying Newton’s method to f/ Vf , that is, 


N =x — SVE — H 
TE" NPY 
And despite the uncanny similarity, Halley’s method is not to be confused with a 
second-order method for multiple roots discovered by Schréder, 


FEC) 
f'(xn) — Flan) f (an) 
obtained by applying Newton’s method to f/f’ [Sch1870, Bod49, Tra64, Bur89]. 


The following special case of Halley’s method is also worth investigating. Let 

g(x) =x? —r. Then, by (11), the Halley LF. for g is 
d 

(d eka + (d+ Yr (12) 
(d+1)x*+(d-1)r 
a result often ascribed to Bailey [Bai41, Fra45, Tra6la], but actually due to 
Lambert in 1770 [Kis54, Tra61b]. Traub [Tra64] remarks that some early authors 
called (12) “Hutton’s method” without reference. Indeed, a footnote in the 
English translation of Halley’s paper [Hal1694, page 644] specifically attributes (12) 
to Hutton in 1786,’ but this clearly postdates Lambert’s work. (See [Bat38, Wal48] 
and especially [Bai89] for more information on Lambert’s method.) 

We close this section with a graphical example comparing the methods of 
Newton and Halley. Let f(x) =x? — 2. Then 
x7 +2 x? + 6x 

2x and (x) = 3x7 +2 
Since f(v2 ) = 0, it follows that N,(V/2 ) = H,(¥2) = v2 (see Figure 3a). More- 
over, this fixed point is attracting since N;(v2) = Hy (V2 ) = 0. And because the 
second derivative of A, also vanishes, whereas the second derivative of Ny does 
not, the graph of H, is flatter than that of N, near the fixed point (see Figure 3b). 
This accounts for the difference in speed at which the two algorithms converge 


Xn+l n 


H,(x) = 


N(x) = 


©The Schwarzian derivative is an important tool in the study of discrete dynamical systems [Dev92]. 
"Hutton was one of the translators. 
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(a) The iteration functions N; and H, (b) A blow-up of the boxed region in 
have the same fixed point. Figure 3a. 


Figure 3. Typical Newton and Halley iteration functions. 


(see [Wal48, Bod49] for details). In general, the higher the order, the flatter the 
graph and, hence, the faster the convergence. 


4. THE METHOD OF TANGENT HYPERBOLAS. Salehov [Sal52] was apparently 
the first to suggest that Halley’s I.F. could be derived using an osculating rational 
function of the form 

x+c 3 
ax + b- (13) 


(Recall from page 421 that an osculating curve to f at x, is one that satisfies the 
equations 


y(x) = 


yOCx,) =fO(x,) (14) 
for k = 0,1, 2.) For convenience, we use an equivalent form of (13), 
(x-—x,) +c 
a(x —x,) +b 


(15) 


y(x) = 
Equations (14) and (15) taken together lead to the system of equations 


5 fn) 
b — ac 
Sr TF On) 
2a(ac — b 
=”) = f"(X,) 
having solution 
(Xn) 
2f' (Xn) — fen) f'n) 
2f"( Xn) 
"OF ay FPO) cs) 
2f(*n) fF (Xn) 
2f' (xn) — f(%n)S"(rn) 


1995] ON THE GEOMETRY OF HALLEY’S METHOD 423 


It follows from (15) that if y(x,,,) = 0, then x,,, =x, —c where c is given in 
(16), that is, 


_ 2F (Xn) Fn) 
" 2F'(xn) — Fen) FI Cn) 


But this is precisely Halley’s method given in (10). In other words, Halley’s formula 
can be derived using an osculating hyperbola. Indeed, Halley’s method is some- 
times called the method of tangent hyperbolas |Sal52, Saf63, Tra64]. 
As an example, consider the function f(x) = e* — 2 which has a unique zero at 
a = log 2. The Halley I.F. for f is 
2(e* — 2)e* (x — 2)e* + 2(x + 2) 


H = = Oa 
p(X) x 2e** — (e* — 2)e* ex +92 


Xn4+1 


Observe that the graph of H, is asymptotic to the diagonal lines y =x + 2 as 
x — +0, respectively (see Figure 4a). Indeed, a direct calculation shows that 


7) 
———} <] 


0< H;(x) = aw 


for all x, making the fixed point globally attracting. Consequently, we may choose 
any initial value we please. For instance, using the starting value x, = 10, system 
(16) yields approximately 


a = —4,539580784 x 1075 
b = 9,079161568 x 1075 (17) 
c = 1.999636834 


from which we obtain x, = x) — c = 8.000363166. Similarly, substituting x, = 
8.000363166 in (16) gives approximately 


= —3.351160651 x 10~* 


a 
b = 6.702321302 X 1074 (18) 
c = 1.997319071 

Thus, x, = x, — c = 6.003044095. 


(a) Halley’s LF. for f(x) = e* — 2 has a unique (b) Two iterations of Halley’s method with 
fixed point at a = log2. Xo = 10. 


Figure 4. A visualization of Halley’s method. 
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The osculating hyperbolas (15) corresponding to (17) and (18) are plotted 
alongside the graph of f in Figure 4b. Notice that upon successive applications of 
Halley’s method, the zeros of the tangent hyperbolas tend to the zero of f. This is 
the soughtafter geometric interpretation of Halley’s method. Incidentally, continu- 
ing this process numerically, we find that x, agrees with a to ten decimal places, 
and thereafter the number of significant digits roughly triples with each iteration. 

The function f(x) = e* — 2 also provides a counterexample to Halley’s claim 
that the irrational formula is generally better than the rational formula. Observe 
that the first two points on the orbit of x, = 1.3 under iteration of C, given in (8) 
are 


1.3 > 0.60021187... > 0.69327247...., 
whereas Halley’s method gives 


1.3 + 0.71110978 ... > 0.69314766.... 


Since log 2 = 0.69314718 ..., we see that Halley’s method gives better approxima- 
tions in this case. Thus, contrary to Halley’s claim, the irrational formula does not 
always give better approximations than the rational formula. 
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Postscript. After this paper was accepted for publication, the authors learned from W. Gander that 
he gave a geometric interpretation of Halley’s method a decade earlier which was deleted from the 
published version of his manuscript [Gan85]. 
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Answer to Picture Puzzle 
(p. 408) 


This is a portrait of Georg Cantor in his late twenties, around the early 1870s when he was 
starting his work on set theory and transfinite arithmetic. Few photographs of Cantor from this 
period of his life seem to exist, and none have previously been published. 


The lady in the photograph is his sister Sophie, three years his junior. She was important 
through his life. In particular, when he suffered mental illness from 1899, he would sometimes 
stay with her and her family after release from mental hospitals or sanitoria. 


We thank Ivor Grattan-Guinness for providing the photograph. 
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1 
The Binary Expansion of — 
Dp 


A. R. Meijer 


Given a recurring sequence Sp, 5,,55,... consisting of Os and 1s, it is always 
possible to find a recurrence relation of the form 

Sntk = 4-1Snak—1 + Ue-25n4e-2 T° FAS 41 1 AOS), (1) 
which if 5), 5,,..-,5,— , are used as initial values, will generate all the subsequent 
terms in the sequence. (The a, in (1) are elements of the field GF(2) = Z/2Z, and 
the operations are assumed to be in that field.) For example, if the sequence has 
period ¢, it can always be generated, in a rather trivial way, by the recurrence 
relation 


Sn+t — Sn (2) 
using the terms Sp), 5,,...,5,_, aS initial values. 
It is customary to associate with the relation (1) the polynomial 
h(x) =x* +a,_,x*"'4+ +++ +a,x + ay (3) 
which is said to generate the sequence. It might be better to view (3) as an 
annihilator of the sequence, in the sense that, given any segment S,, S;,4,---5 Sjaks 
where k = degree of h(x) and i is any natural number, one has 
Site + Ay Sjpp-1 t °° F48;41 + A95; = 0. 


The family of all polynomials which annihilate the sequence in this sense is easily 
shown to be an ideal in the principal ideal domain GF(2)[x], and consequently 
there exists a polynomial g(x) (of minimal degree in this ideal) such that every 
annihilator of the sequence is a multiple of g(x). In particular (2) shows that if the 
sequence has period t, then x‘ + 1 is a multiple of g(x). The degree of g(x) is 
called the linear complexity of the given sequence. An efficient algorithm, due to 
Berlekamp and Massey exists for finding g(x). (See, for example, [1, p. 176].) 

In this note we find g(x) in the particular case where the given sequence is the 


binary expansion of —, where p is a prime such that 2 is primitive modulo p: thus 


no power of 2 less than 2”~! is congruent to 1 modulo p, or, equivalently, the 
binary expansion is recurring with period p — 1. In many applications one would 
wish p to be large (of order 10! to 10°) in which case use of the Berlekamp- 
Massey algorithm is infeasible. 

We shall restrict ourselves to primes p of the form p = 2q + 1, where q is 
itself a prime, g = 1 modulo 4. Standard results in number theory (see [6], for 
example) then guarantee that 2 is primitive modulo p. It should be pointed out, 
however, that the final result of this note holds for any p with 2 primitive, and only 
minor modifications to the argument are necessary to prove this. Primes of the 
form that we consider (‘safe primes” under a rather loose definition of “safety” 
for use in the RSA cryptosystem) have the advantage of being relatively easy to 
find [4], even though it is unknown whether an infinite number of them exist. 
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By way of a very small example, let us consider the case p = 11. Then 


1 

— = 0,0001011101... 
11 

; 0.0010111010 
— = 0. - 
* = 0.0101110100 
— = 0. - 
* = 0.1011101000 
= = 0. - 
5 

— = 0,0111010001... 
11 

it (4) 
— = 0.1110100010... 
11 

” = 0.1101000101 
— = 0. - 
” _ 01010001011 
— = 0. - 
° 0.0100010111 
— = 0. - 
© = 0.1000101110 
— = 0. - 


1 
Our interest in the expansion of — stems from the fact that the sequence 
Pp 


1 
> = 0.5 95,5553...a4... (5) 


can be shown to display “good” pseudo-randomness properties—“good” in the 
sense that it looks pretty much like a really random sequence of Os and ls. 
(Postulates for ““goodness” were laid down by S. W. Golomb in [3].) The sequence 
has in fact been studied for this reason by Blum, Blum and Shub [2]. Moreover, 
given p, generating the sequence (5) is extremely fast, which would seem to 
recommend its use in, for example, bit stream ciphers. (In a bit stream cipher a 
message is made unintelligible to anyone not in possession of a secret key, by 
adding a pseudo-random bit, modulo 2, to every bit of the message.) If one 
observes, as in the example above, that for any integer x € [1, p — 1] the expan- 


sion of (x/p) is a cyclic shift of the expansion of —, this becomes even more 


tempting, since x can then be used as the key. Regrettably, as shown in [2], such a 
cipher would be very insecure against attack, since only [log, p| consecutive bits 
determine x completely: a fact which is easily seen if these [log, p| bits are the 


first ones; if the sequence is b,b,b,..., b; € {0,1}, then x is the integer nearest 
b, by bs; 
x{—+—4+—4---]. 
PV” 4" 8 


The general case, in which the known bits appear somewhere in the middle, 
merely involves a cyclic shift of this special case. 

For a general discussion of bit stream ciphers and the linear feedback shift 
registers (LFSRs) generating pseudo-random bit sequences, we refer the reader to 


1 
THE BINARY EXPANSION OF — 
428 P [May 


ag a; \42 Ay —2\4x-1 


output 


Figure 1 


[7] or [5]. An LFSR may be considered to be an implementation in hardware of the 
recurrence relation (1), as in figure 1, in which the square boxes represent single 
bit memory registers (‘‘flip-flops”), the switches represent the coefficients a; (i.e. if 
a; = 0 the ith switch is open, if a; = 1, it is closed) and ® denotes addition 
modulo 2. At each clock pulse, the contents of the registers are shifted in the 
indicated direction. The reader may verify that if the registers in figure 2 initially 
contain 000101, then the output sequence will be the binary expansion of a. 


output 


Figure 2 


The linear complexity of the sequence, that is the minimal degree among all 
polynomials which generate the sequence, now translates into the minimal number 
of memory registers required to produce the sequence. Figure 2 shows that the 
sequence (4) has a linear complexity of at most 6, or equivalently, that (4) may be 
generated by the recurrence relation 

Sn+6 ~ Sn+5 + Sn+1 + Sn 

with which we associate, as in (3), the polynomial 

xo+ x? txu41. (6) 
Linear complexity measures to some extent the pseudo-randomness of the se- 
quence, but only in one direction: a “good” pseudo-random sequence must have a 
high linear complexity. (To see that the converse is false, consider the sequence 
consisting of 10°° zeros, followed by a one, and then recurring. This has linear 
complexity = period = 10°° + 1, but does not appear very random!) 

We shall show that, if p is of the form p = 2g + 1, with g = 1 modulo 4 also 
prime, then the linear complexity of the binary expansion of = is 3(p + 1). 

To this end, note in the first place that, by Euler’s theorem, 2”°~?/* = p — 1 
modulo p. Thus = and the fractional part of 2°~/?/p add up to 1 = 0.11111... 
It follows immediately that the first (p — 1)/2 digits of “ and the second (p — 1)/2 
digits are each other’s complements. This implies of course that the recurring part 
of - contains equal numbers of Os and 1s. It is also clear that the expansions of 4 
and of the fractional part of 2°°+/? are similarly complementary (as shown in the 
example above by 4 and 7 and by ~ and 7). Thus, if we denote by c, the 
expansion of the fractional part of - then we have, adding componentwise 
modulo 2, that 


Co+n/2 tT &p-n/2 Ft C1 + Co = 0 
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or, in the form of equation (1), that 

Snt+(p+1/2) — Sn + Spay + Sn+(p-1)/2 
or, in terms of a generating polynomial, that 

h(x) =1L+ x +x D72 4 xt D/? 
generates the sequence. The linear complexity of the - sequence is therefore at 
most (p + 1)/2. 

To prove that it cannot be less than this, suppose that a relation of the form 
AyC, FACyzyy Ft FAg_Cnse-1 + Cuan = 9 (7) 

holds, with k minimal. Since the first p — 1 components of c; contain equal 
numbers of Os and ls, it is clear that the number of nonzero terms in (7) must be 
even. Moreover, for 0 <k <p —1, all the c, are distinct, so the number of 


nonzero terms in (7) must be greater than 2. 
Next recall, that g(x) must be a divisor of x?~!+4+ 1 


= (xP -D/2 + 1)° 


=(x+ 1)7[ x79? tree tx + 1]. 
Now the term inside the square parentheses has an odd number of terms (since 
p = 3 modulo 8) and therefore it cannot be the generator polynomial nor, clearly, 
can any of its divisors. Obviously g(x) #x + 1. 
The product of x + 1 and the term inside the square parentheses has only two 
terms, and is therefore also ineligible. The polynomial of least degree which might 
do the trick is therefore 


(x + 1)*[x0?- 972 4 ee tx 4-1] 
— y(PtD/24 y(P-D/24 y4 ] 
and it follows that the linear complexity of the sequence (4) is therefore at least 


(p + 1)/2, which completes the proof. In fact, putting these two parts of the proof 
together, we see that 


g(x) =xPtD72 4 yPOD/2 4 y+ 


is of least degree generating the sequence, of which (6), or the associated circuit of 
figure 2, is an example. 

It is interesting that this method enables one to generate pseudo-random bit 
sequences of, apparently, arbitrarily high linear complexity; certainly of higher 
complexity than could ever be physically implemented in an LFSR. 
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Pick’s Formula via the Weierstrass 
g-Function 


Ricardo Diaz and Sinai Robins 


1. INTRODUCTION. If the vertices of a polygon lie on the lattice of points in the 
plane whose coordinates are integers, then Pick’s formula provides a method for 
computing the area of the polygon simply by counting lattice points. Let B denote 
the number of lattice points that lie on the edges of a simply closed polygon 
(including vertices), and let J denote the number of lattice points that lie within 
the interior of the polygon. The area A of the polygon, according to Pick, is 


A=I1+35B-1. 


This simple relation is comprehensible to a fifth grader [17], yet the theorem 
continues to intrigue modern researchers because of its deeper connections with 
combinatorics and algebraic geometry. A multitude of alternative proofs and 
generalizations of Pick’s formula have appeared since the original proof in 1899 
((7], [8], [9], [11], [15], [25)D. The first generalization to lattices in three dimension 
was made by J. E. Reeve in 1957 [19], who cleverly introduced an auxiliary lattice. 
His work was subsequently extended by others to higher dimensions as part of an 
analysis of the so-called ‘Ehrhart polynomials’ of polyhedra. ((4], [5], [12], [14], [18], 
[20], [22], [24]). Recently, connections between the algebraic-geometrical properties 
of toric varieties and the number of lattice points in polyhedra have been 
investigated ((2], [18]). There are also generalizations of Pick’s formula to other 
types of archimedean lattices ((10], [21]); and to self-intersecting polygons ((10], 
[23], [26]). Although our complex-analytic proof of Pick’s formula is not the 
shortest available (by any means!), it does expose a host of connections between 
lattice geometry, the Weierstrass g-function, classical magnetostatics, and 
Kodaira-Hodge-DeRham theory. Our approach is motivated by the idea that Pick’s 
theorem is a discrete version of Green’s theorem in the plane. 

We first summarize for the reader some classical results in complex analysis 
pertaining to complex-valued functions that are doubly-periodic; that is, invariant 
with respect to translation by integral multiples of two linearly-independent vectors 
in the complex plane. Those readers already conversant with these results will 
realize that traditionally in complex analysis these vectors are not required to be 
perpendicular or of unit length; but for the purposes of this paper we found it 
convenient to adopt the simplifying convention that the two vectors correspond to 
1 and i. The lattice they generate is called the Gaussian lattice. Of course a version 
of Pick’s theorem holds for any lattice generated by two linearly independent 
vectors in the complex plane, since an affine transformation leaves the linear 
relationship of Pick’s formula invariant except for a multiplicative change of scale 
for area. 
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Definition 1. A function f(z) is said to be doubly-periodic with periods i and 1 if 
f(z) = f(z + i) =f(z + 1). This implies that f(z + m + in) = f(z) for all inte- 
gers m,n. 

If f(z) is doubly-periodic, then the integral of f(z) dz along opposite sides of a 
parallelogram cancel if the vertices of the parallelogram lie on the square lattice 
and if the sides carry opposing orientations. Note that despite the use of complex 
notation, we are not requiring here that f(z) be an analytic function. 


Definition 2. The Weierstrass g-function for the Gaussian lattice is defined by 
g(z)=z27+yD |(z —a)*- a~?| 
a 


in which the sum extends over all Gaussian lattice points except a = (0, 0). 

The sum converges uniformly and absolutely on compact subsets of the lattice- 
punctured plane. Because the sum is invariant under rearrangement of its terms, 
one can check easily that g(z) is a doubly-periodic function which is analytic 
except at the lattice points. Because the lattice is invariant under multiplication by 
i, one can also verify from the preceding formula that g(iz) = —g(z). The residue 
of g(z) at each pole is zero, because all poles are double poles. 


Definition 3. An antiderivative for the function — g(z) is given by the Weierstrass 
zeta function 


{(izy)=zt+y l(z —a)'+a7'4+ za~?]. 


This series also converges uniformly and absolutely on compact subsets of the 
lattice-punctured plane. One can verify that d{(z)/dz = —(z) by termwise- 
differentiation of the series defining ¢(z). Note that ¢(z) has its poles located 
precisely at all lattice points, and that the residue of f(z) at each pole is 1. It is 
easy to verify that f(z) is an odd function. This follows from the fact that the 
lattice is invariant under multiplication by —1. Unfortunately, £(z) is not doubly- 
periodic (but rather is ‘pseudo-periodic’ in the terminology of [13]). 


§2. THE INGREDIENTS. The following Lemma shows that the Weierstrass 
¢-function is only a conjugate-analytic, linear term away from being doubly peri- 
odic. We could show this by using the ‘Legendre-relations’ for ¢ [13], but instead 
include a self-contained proof for completeness. 


Lemma 1. There exists a constant a such that the function $(z) = {(z) — aZ is 
doubly-periodic with periods i and 1. 


Proof: The obstruction to double-periodicity of ¢(z) is the nonvanishing of the 
expression 


f(z +mtin) — ¢(z) = — f(z + w) dw. 


The preceding definite integral is independent of path because the residues of 
g(z) are zero. Because g(z) is doubly-periodic, this integral can be expressed as 
the sum of m duplicates of an integration taken along a horizontal linear path of 
length one and n duplicates of an integration taken along a vertical linear path of 
length one. Thus the integral equals ma +n where a(z) = — {yo(z + w) dw 


432 PICK’S FORMULA VIA THE WEIERSTRASS (-FUNCTION [May 


and B(z) = — {jm(z + iw)idw. Since the integrand in the definition of a(z) is 
periodic on each horizontal line, the quantity a(z) is invariant under horizontal 
translations of z; hence a(z) = a(y). Similarly B(z) = B(x). However, the expres- 
sion £(z + m + in) — £(z) = may) + inB(x) must be analytic in the variable z 
on the lattice-punctured plane. Set m = 0 to deduce that B(x) must be analytic on 
the lattice-punctured plane. But from the Cauchy-Riemann equations it follows 
that an analytic function that depends on only one of the real coordinates must be 
constant. Similarly a must be constant. Returning then to the definitions of a and 
B one sees upon integration of the identity g(iz) = —g(z) that there is the 
relation B = —ia. Thus {(z +m + in) — ((z) =(m — inda = [(z + m + in) — 
Zla. This implies that £(z) — aZ is doubly-periodic. Q.E.D. 


The doubly-periodic function (z) is not analytic because of the presence of the 
conjugate-analytic, linear term az. Nevertheless, d(z) has some very nice integral 
properties. Let C denote a canonically oriented curvilinear polygonal path in the 
complex-plane; that is, a piecewise continuously-differentiable, simple closed curve 
that winds once counterclockwise around a bounded, simply-connected domain D. 


Lemma 2. If C passes through no lattice points, then the number I of lattice points 
inside D is related to the area A of D by the formula 


dz =I 
sai LOC) Zz=1-A. 


Proof: {-¢(z) dz = {[lg(z) — azldz = {[ef(z) dz — afx — iy dx + idy). By the 
Residue Theorem, (27i)~‘/.¢(z) dz is the sum of the residues of {(z) at all 
interior poles. There are J such poles, and each pole of £(z) has residue 1. Thus 
(Q7i) 'Y{ok(z) dz = 1. 

On the other hand, Green’s Theorem can be used to show that [-(x — iy dx + 
idy) = [A(x — ty) dx + (y + ix) dy = ffply + &), — (& — iy), = ffp2i = 2iA. 
Multiply by a and combine with the results of the preceding paragraph to deduce 
that {.¢(z) dz = 27iI(D) — 2ia A. To complete the proof of the lemma, it merely 
remains to show that a = 7. This can be deduced by taking C to be a square path 
centered at the origin of sidelength one, encircling the origin once counterclock- 
wise. Since $(z) is doubly-periodic, the path integral {.¢(z)dz vanishes. Thus, 
0 = 27iI(D) — 2ia ACD). For the square, obviously J = 1 and A = 1. This forces 
a=7. Q.ELD. 


It is clear from the preceding lemma that the imaginary part of the expression 
o(z) dz provides a link between the number of vertices enclosed by a contour C 
and the area of the region enclosed by C. We now generalize Lemma 2 by 
considering curvilinear polygons C that perhaps pass directly through finitely many 
lattice points. It is easy to verify that the singularity of the imaginary part of 
o(z) dz at each lattice point is the same as the singularity of the expression 
do = (—ydx + xdy)/(x* + y”) at the origin. This singularity is mild enough to give 
meaning to the improper integral {.S|¢(z) dz] whenever C decomposes as the 
union of finitely-many parametrized paths C(t), along each of which the derivative 
dC /dt is continuous and never vanishing. The improper integral is defined to be 
the limiting value of the path integral that remains after deletion of the portion of 
C trapped within a ball of small radius centered at each singularity of the 
integrand. Since the remaining path of integration is no longer a closed path, we 
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use the standard trick of introducing an auxiliary circular arc that winds partially 
around each singularity, that stays within D, and that closes up the disconnected 
path of integration near the singularity. The resulting closed curve satisfies the 
hypotheses of Lemma 2. The auxiliary arcs have the effect of introducing correc- 
tion terms into the conclusion of Lemma 2 that depend upon both the residue of 
the integrand at each singularity of the integrand and the radian measure of the 
associated auxiliary arc: 


Lemma 3. Let C be a parametrized simple closed curve as above. Let ©, denote the 
radian measure of the counter-clockwise interior angle within D at the j’th lattice 
point on C formed by the incoming and outgoing tangent vectors to C through this 
vertex (see figure 1). Then, 


= [36 dz} =I-A+t z= [29] 
as. 
> 


ry SX 


Figure 1 


Lemma 4. For every curve C described above, the path integral of the imaginary part 
of (z) dz taken along C agrees with the path integral taken along the antipodal 
curve C = —C, whose parametrization is related to that of C by C(t) = —C(t). 


Proof: It suffices to verify that the expression #(z)dz is invariant under the 
change of variables z — —z. Recall that ¢@ is a linear combination of ¢ and Z, 
both of which are odd functions of z. Thus #(—z)d(—z) = [—¢(z)\(-dz) = 
o(z)dz. Q.E.D. 


Lemma 5. If the path C in Lemma 4 is actually a polygon, then {-.%|¢(z) dz] = 0. 


Proof: The proof is easier to understand if one first selects a so-called extreme 
vertex of C, which is a vertex with the property that some closed half-plane 
intersects C only at the vertex. Translate the polygon C so that one such extreme 
vertex is the origin. Consider the antipodal map z — —z that maps C to C. The 
interior of C and the interior of C do not overlap, nor do they share any boundary 
points in common except the extreme vertex at the origin. Apply Lemma 3 to C, 
and use Lemma 4 to see that 2/.3(z) dz = [-S[6(z) dz] + feS8[b(z) dz]. From 
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even the crudest diagram it is clear that each line-segment on C is a translate of 
the corresponding antipodal segment on C, but endowed with the reverse orienta- 
tion.| The double-periodicity of ¢ therefore implies that the integral of 3[¢(z) dz] 
over C+ C vanishes because of pairwise cancellation of integrals. Thus 
[-31b(z) dz] =0. QED. 


§3. PROOF OF PICK’S FORMULA AND FURTHER COMMENTS. The preced- 
ing lemmas provide all the information needed to establish Pick’s Formula. 


Proof: Apply Lemma 5 to the polygon C. Deduce from Lemma 4 that A — J = 
(1 /277)XO,. The supplementary angles ©, — 7, sometimes called the clockwise 
exterior angles to the polygon, sum to — 277 because 1 is the total winding number 
about the origin of the curve swept out by the tangent vector to C [3, p. 217]. 
Therefore the sum of the interior angles is w(B— 2), where B denotes the 
number of boundary lattice points. Thus A — J = (1/2).B — 2). Q.E.D. 


Comment 1. (Connections with complex cohomology) We had an explicit formula 
for d(z) dz that made it easy to establish Lemmas 1-5. A less explicit proof of the 
existence of a doubly-periodic function having the properties described in these 
lemmas could have been established by Hodge-theoretic methods (see [27] for a 
good introduction to Hodge-Theory). Consider the complex manifold obtained by 
identifying opposite edges of the unit square to obtain a torus. Place a point mass 
(5-function) at the center of the square, and consider the expression w = (277i) 
[5 — 1] dx dy. The integral of this two-form over the unit square vanishes, which by 
Hodge Theory is the necessary and sufficient condition for there to exist a 
complex-valued generalized function ® such that the one-form ¢(z) dz satisfies 
d{®(z) dz] = w on the torus. Note that &% is invariant under the change of 
variables z — —z, since the delta-function and the constant function 1 are even 
functions with respect to this transformation. To obtain a one-form that is also 
invariant under this transformation, take ¢(z) to be the odd part of ®(z). With 
this choice we still have d(¢ dz) = Ww. Regard ¢ as a doubly-periodic odd function 
on the complex plane. Lemma 2 now follows upon integration of the identity 
d(¢ dz) = # (Green’s Theorem). Lemma 3 follows by taking the imaginary part of 
Lemma 2 and by noting that the local structure of the singularity of #(z) at each 
lattice point must be like £(z) since d[@ dz — {£(z) dz] is a two-form on the torus 
that is free of singularities. Lemma 4 follows from the oddness of @. The 
remaining arguments (Lemma 5 and the proof of Pick’s Formula) require no 
further modifications. 


Comment 2. (Connections with magnetostatics) In the theory of classical magneto- 
statics, the one-form S[¢(z) dz] can be identified with the magnetic force field B 
in R° induced by the steady flow of two opposing currents whose combined density 
distribution is yw. The singular and regular parts of & correspond to a current 
concentrated on a lattice of parallel wires, and a spatially uniform opposing 
current, respectively. Lemma 5 can be interpreted as a restricted conservation law 
satisfied by this induced magnetic field valid for all lattice polygons (see figure 2). 


'This is the only part of the proof where we require that the sides of C be line-segments rather than 
curves. 
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Figure 2 


Comment 3. (Connections with hyperbolic lattices) Pick’s theorem fails in the 
hyperbolic plane. That is, there is no linear relationship between the area of a 
geodesic polygon and the number of lattice points which it encloses. To make this 
precise, we should first define what a lattice means in the hyperbolic plane. The 
Modular Lattice is defined by 


yD [er _ [2 A SL (2), 


co+d 


where p = e*7'/3, We define a polygon to be a simple closed curve which is the 
finite union of geodesic arcs (which must be circular arcs perpendicular to the 
x-axis, or vertical line segments), where the vertices of the polygon are now in _Z. 

As is well-known [1], Gauss-Bonnet tells us that the area of a hyperbolic triangle 
is 7 minus the sum of its interior angles. Fix the triangle A ABC, where A = p, 
B=p+1, C =i~, Since 2 ABC = ZCAB = 7/3, and 2 BCA = 0, the area of 
A ABC is 7 — (7/3 + 77/3 + 0) = 77/3. Now fix the triangle A ABD, where 
D =p + 2. Thus 


area( A ABD) =a — (€ + 27/3 + €) = 7/3 — 28, 


where « > 0. Thus area(A ABD) # area(A ABC). 

If there were a linear relationship between the area of a geodesic polygon and 
its boundary and interior lattice points, then both of the above triangles would 
have the same area, because they both have the same number of interior and 
boundary lattice points. The fact that they do not have equal areas proves our 
claim. A similar argument shows that Pick’s theorem fails for any lattice in the 
hyperbolic plane which is generated by a discrete subgroup of SL.(R). Thus there 
is something unique about the Archimedean case. 


Comment 4. (Connections with the Weierstrass g-function) Recall that in the 
proof of Lemma 2 we established that a = 7. That is, Pick’s Formula has provided 
us with properties of the Weierstrass g-function! 
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NOTES 


Edited by: John Duncan 


Permutations as Products 
of Transpositions 


George Mackiw 


When writing a permutation as a product of transpositions, what is the smallest 
number of transpositions that can be used? This question and variants of it occur 
both abstractly [2] and in applied settings such as data exchange and sorting [3]. 
The answer is known and easily stated: the minimum number is precisely n — r, 
where r is the number of disjoint cycles in the given permutation on n letters. One 
way to establish this result is to use an inductive argument relying on an analysis of 
how cycles multiply [1]. Another approach, employed in [4], restates the problem in 
the language of graph theory and makes use of the fact that a connected graph 
with n vertices must have at least n — 1 edges. 

The purpose of this note is to provide an alternate derivation of this result that 
uses only elementary linear algebra. The very answer, nm — r, seems to suggest 
some dimension counting involving complementary spaces, and, indeed, our argu- 
ment takes advantage of orthogonality and the Gram-Schmidt process. 

The linear algebraic connection is a natural one. Elements of the symmetric 
group S, permute coordinates in R” and are often realized as permutation 
matrices. More precisely, we regard a permutation o in S, as acting on the 
Euclidean space R” by oe; = e,,;,, where e;,e,,...,e,, denotes the natural basis 
of R”. 

In this setting, transpositions have a simple geometric interpretation. Given the 
transposition 7 = (i,j) in S,, i <j, we call the vector e; —e ; in R" the vector 
associated to 7. Notice that 7 sends this vector to its negative e, — e;. Further, 7 
fixes pointwise the collection of n — 1 vectors {e,|k # i, j} U {e; + e,} which are 
all orthogonal to e; — e;. Indeed, these n — 1 vectors form a basis for the subspace 
(hyperplane) orthogonal to the vector e; — e ; simply put, 7 acts as the reflection 
through the hyperplane orthogonal to e; — e,. 

We also attach a subspace to any permutation o—the fixed point space, V,, 
consisting of all vectors x in R” with ox =x. This is the eigenspace of a 
corresponding to the eigenvalue A = 1. It always has positive dimension since, for 
example, any vector all of whose components are equal is in V, for any o. As a 
matter of fact, it is not difficult to see that the fixed point space is determined by 
the cycle structure of the permutation. Note that vectors in V, must have their ith 
and jth components agreeing whenever i and /j occur in a common cycle of oa. It 
follows that if o is written as a product of r disjoint cycles, including trivial cycles 
containing only one point, then V, is r-dimensional with each cycle of o contribut- 
ing a basis element in a natural way to V,. 

For example, the permutation o = (2,5,3)(1,6) in S, has a four-dimensional 
fixed point space in R’ that has the vectors e, + e, + e3,e, + e,, e, ande, fora 
basis. 
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We read products of permutations from right to left. Thus, (1, 2,3, 4,5) = 
(1,5), 4)(1, 3)C, 2) expresses a five-cycle as a product of four transpositions. In 
like fashion, an s-cycle can be written using s — 1 transpositions. So, a permuta- 
tion in S, consisting of r cycles can be written as a product of m — r transposi- 
tions. We are now ready to show that no fewer number of transpositions can be 
employed. Note that in counting cycles of a permutation we always include trivial 
one element cycles. 


Theorem 1. A permutation in S, cannot be written as the product of fewer than 
n — r transpositions, where r is the number of disjoint cycles in the permutation. 


Proof: Suppose o in S,, is written as 0 = 7,7, --- 7,, where the 7,’s are transposi- 
tions. Viewing transpositions as _ reflections through hyperplanes, let 
v;,,i = 1,2,...,k, be the vectors associated to these transpositions. Recall that v, 
is a vector orthogonal to the hyperplane determined by 7,. The Gram-Schmidt 
orthogonalization process guarantees the existence of at least n —k linearly 
independent vectors that are orthogonal to the subspace spanned by the u,’s. 
These n — k vectors thus lie in the intersection of the k hyperplanes determined 
by the transpositions and are thus pointwise fixed by each of the transpositions. 
Thus these vectors are fixed by o, and so dimV. >n—k. But, dimV, =r= 
number of cycles in o. The result kK >n —r follows. O 


Whenever 0 = 7,7, °** 7T,, a product of transpositions, and k is the minimum 
number allowed by Theorem 1, we refer to this as a minimal representation of o. 
We now use orthogonality to show that a minimal representation must have 
associated vectors that are linearly independent. 

Any o in S, determines a direct sum decomposition R” = V, ® V.~ , where V.~ 
denotes the orthogonal complement in R” of the fixed point space V,. If o = 7,7, 

‘ 7, iS a Minimal representation, then dim V, = n — k. Now the vectors u,,7 = 
1,2,...,k, associated to the transpositions +; are normal vectors to hyperplanes 
H;. Since 7; fixes H, pointwise, the intersection ( *_,H, is a subspace contained in 
V,.. The intersection of these k hyperplanes is the solution space to a k by n 
homogeneous system of equations, where the ith equation expresses the require- 
ment that a vector in H; must be orthogonal to v;. Elementary results concerning 
rank and solution spaces of systems of equations show that dim(Q *_,H,) =n — k, 
with equality occurring exactly when the normal vectors v,,vU5,...,U, are linearly 
independent. We have derived the following result. 


Theorem 2. If the representation 0 = 7,T, °** T, is a minimal one, then the 
associated Vectors U,,U,,...,U, are linearly independent and form a basis for V~ . 
CJ 


For example, (1, 6)(3, 4)(4, 6)(, 3) could not be a minimal representation, due to 
the dependence relation e, — e, = (e; — e,) + (e, — e,) + (e, — e3). 

Other reasonably intuitive results about minimal products of transpositions can 
be obtained using this approach. For example, a minimal representation 0 = 7,7, 

- 7, must respect the cycle structure of o. For, suppose that some transposition 
T, = (a, b) was such that a and Db belonged to different cycles of a. Then the 
vector Uv = e, — e, would not have inner product zero with the vector w = Lie,, 
where a ranges through the elements of the cycle of o containing a. But this 
contradicts our result that w is in V,, while the associated vector v is in V.~ . In 
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particular, no transposition 7, = (a, b) can have either a or b belonging to a trivial 
one element cycle of a. 
The converse of Theorem 2 is also true, though we omit the arguments. 
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Congruences Relating the Order 
of a Group to the Number 
of Conjugacy Classes 


Bjorn Poonen 


Let G be a finite group, and let |G| denote its order. Let s be the number of 
conjugacy classes in G. Burnside, in his 1911 text on the theory of finite groups, 
used representation theory to prove that if |G| is odd, then |G| = s (mod 16). (See 
p. 295 of [1].) On p. 320 of the same book, he left as an exercise to show that if 
every prime dividing |G] is congruent to 1 modulo 4, then |G| = s (mod 32). The 
purpose of this note is to show how elementary counting arguments can yield other 
congruences in the same spirit. Here is what we will prove: 


Theorem. Let m > 2 be an integer. If each prime divisor of |G| is congruent to 1 
modulo m, then |G| = s (mod 2m?). 


Taking m = 2 in this theorem yields only |G| = s (mod 8), which is weaker than 
Burnside’s original result. On the other hand, taking m = 4 yields exactly his 
exercise. 


Proof: Let 

T = {(g,h) © G X Glgh # hg}. 
For each unordered pair {C,, C,} of cyclic subgroups of G, we may consider the set 
of (g, h) in G X G such that the subgroups <g>, (> they generate are C, and C, 
in some order. Such subsets clearly form a partition of G X G. 
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Step I. T is a (disjoint) union of such subsets. 
Simply note that 


(g,h) ET Wx € (g), Vy & (h), xy = yx 
and the right hand side depends only on the unordered pair {(g), ¢h)}. 


Step 2. Any such subset S lying in T has cardinality a multiple of 2m”. 
Let {C,, C,} be the pair of cyclic subgroups corresponding to S, and let n,,n, be 
their orders. Then 


S = {(81,82) € C, X Cylg; generates C;} 
U{( 82,81) © Cy X C, |g; generates C;}. 


Since C; has #(n;) generators (where ¢ is Euler’s phi function), each of the two 
sets in the union has size ¢(n,)é(n,). Moreover, C, # C, since S CT, so the 
union is disjoint, and 


IS] = 26(n,) (7). 


Also since S ¢ T, neither C, nor C, equals {1}, so we can pick prime divisors p, of 
n;. Then p, divides |G|, so m divides p; — 1 = o(p;), which divides 6(n,). Thus 
2m? divides |S|. 


Step 3. |T| = |G|ICUG| — s). 

If g <G, let C.(g) denote the centralizer of g in G, ie., the subgroup 
consisting of the A in G which commute with g, and let X, denote the conjugacy 
class of g € G. Since X, is the orbit of g under the conjugation action of G on G, 
and C,(g) is the stabilizer of g under this action, we have |X,| = |G|/|C,(g)I. 
Now 


(GX G)\T|= > (the number of A € G which commute with g) 
geG 
=) ICe(g)| 
geEG 
=|G| > 1/|X,| (by the remark above). 
gEG 


The sum of 1/|X | over a conjugacy class is 1, so the sum over all of G is s, and 
\(G x G)\ T| = |Gls 
IT| = IG|(IG| —s). 


Conclusion of Proof: By Steps 1 and 2, 2m? divides |T|, which equals: |G|(|G| — s), 
by Step 3. But 2m? and |G| are relatively prime, since every prime factor of |G| is 
congruent to 1 modulo m (hence greater than m), and m > 2. Hence 2m? divides 
IG| — s; i.e. |G| =s (mod2m7?), as desired. O 


Question. Can the argument in the m = 2 case be modified to obtain Burnside’s 
original result, that if |G| is odd, then |G| = s (mod 16)? 


\ 
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The Color Invariant for Knots and Links 


Peter Andersson 


1. INTRODUCTION. A knot in three dimensions is a closed curve which can be 
represented in a diagram with double points where the curve crosses itself 
transversely. We indicate in the diagram which part of the curve that lies under 
and which part that lies over the crossing. A system consisting of several curves is 
called a link. Invariants are used to show that two curves cannot be deformed to 
each other. All deformations can be performed with the Reidemeister moves 
(figure 1) together with deformations not concerning the crossings. For proof 


see [1]. 


aan aN 


RII ‘ = / 
NS 


Figure 1. The Reidmeister moves. 


2. THE COLOR INVARIANT. An arc is a piece of the curve between two 
undercrossings. Some other piece of the curve can pass under the arc. 


Definition 2.1. A knot or link K can be colored mod n if there are integers m, and 
n, for each arc in some projection, such that the following holds for all crossings: 


m,+m,=2m, (modn) (*) 


where m, and m, are the integers associated with the arcs going under the 
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crossing, m, is the integer associated with the overcrossing arc and n > 2. It is also 
required that there are colors m, in at least two different equivalence classes. 
See [2]. 

Another formulation of this is that the colors lie symmetrically at the periphery 
of a circle and that the color of the incoming undercrossing is reflected in the 
diameter where the color of the overcrossing lies. For n = 3 this means that the 
color at an undercrossing changes if the overcrossing has a different color, and 
remains the same if it has the same color. We have an odd number of colors at 
each crossing (Figure 2). See [3] and [4]. 


Figure 2. 


We have the following well-known theorem (see [2], [3] and [4}]): 


Theorem 2.2. Let K be a knot or link which can be colored modn, then every 
projection of K can be colored mod n. 


Proof: We check the Reidemeister moves by solving the equations or looking at 
the circle: 


R I: Every arc involved has the same color. 


R II: An arc is added or removed. The color of the incoming arc at one 
undercrossing is reflected back at the other crossing. We can lose one color this 
way, but it takes at least two different colors to change color at a crossing, so we 
still have more than one color present. 


R II]: At most one arc changes color, all colors of the arcs out of the three 
crossings remain the same. O 


Corollary 2.3. If a knot can be colored modn then it cannot be deformed to an 
unknotted curve. 


Proof: Assume that the knot could be deformed to an unknotted curve. Then by 
Theorem 2.2 the unknotted curve without crossings could be colored mod n, but a 
curve with no crossings and only one arc cannot have more than one color. We 
require that at least two colors are present in a colored projection. This gives a 
contradiction. O 


Corollary 2.4. There exists a knot which cannot be deformed to an unknotted curve. 
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n=3 


Figure 3. The trefoil knot. 


Proof: The trefoil knot can be colored mod 3 (figure 3). By Corollary 2.3 this knot 
cannot be deformed to an unknotted curve. O 


Corollary 2.5. If a link is splittable then it can be colored modn, n = 2. See 
Nanyes [2]. 


Proof: In a split link the components can be colored with two different colors. By 
Theorem 2.2 every projection of the link can be colored modn, n>2. O 


Example. The Borromean rings (figure 4). The rings are not pairwise linked. It is 
easy to see that it is impossible to color the rings mod3. By Corollary 2.5 this 
shows that they cannot be split. It is possible, as in the right picture, to color the 
rings mod 4. We also note that the link is alternating. 


Figure 4. The Borromean rings. 


To find if a knot or link K with k crossings can be colored is equivalent to 
solving a system of linear equations 


Cx = nb (**) 


where C is the coefficient matrix corresponding to n = 0 in the color relations (*), 
b and x are integer vectors and n an integer. 

Each row in the k X k matrix C corresponds to a crossing and consists of the 
elements 2, —1, —1 and k — 3 zeroes. If we as in RI have a loop the row becomes 
1, —1, and zeroes. The columns correspond to the arcs and consist of the elements 
—1, —1, as many twos as the number of curve pieces the arc is crossing over, and 
zeroes. Addition of the columns gives 0, so det(C) = 0. 
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Let C_ be a k — 1 Xk — 1 submatrix to C obtained by deleting one row and 
one column with one non-zero element is common. 

To solve (**) we can solve C_x = nb_ by Cramer’s rule. Equations (**) have an 
integer solution if we choose n = |det(C_)|. We also need more than one color in 
the solution to show that the knot or link can be colored. 


Example. The knot 8,, (figure 5). 


Xx Joe) 
fA 


9 


Figure 5. The knot 89. 


With b = (1,0,0,0,0,0,0, —1)? and n = |det(C_)| = 9 we see that (**) have a 
solution (X1, X2,..., X8) = (7,2, —3, 1, 8,3,5,0)7. So the knot can be colored 
mod 9. 


3. ALTERNATING KNOTS AND LINKS 


Definition 3.1. A projection of a knot or link is said to be alternating if the 
crossings alternate over-under-over-under-... as one goes along the curve. An 
alternating projection is reduced if none of the four local regions at a crossing 
belongs to the same region in the diagram. 

If we have a reduced alternating projection the columns of the matrix C consist 
of (2, —1, —1,0...) as well as the rows and this holds if we have no simple loops 
attached to some arc. It is clear that it is possible to write C with the twos as 
diagonal elements. 


Lemma 3.2. Let det(C _) be a subdeterminant to the matrix C in (**), written in the 
form above, with one deleted 2, for a reduced alternating knot or link. Then det(C _) 
is unambiguous. 


Proof: If all rows in C_ are summed to a row, the deleted row in C is obtained 
with different sign. The same holds for the columns. If the new row and column is 
multiplied by —1 we get the same matrix as if we deleted the other row and 
column. The determinant is unchanged. O 


Theorem 3.3. Let K be a reduced alternating projection of a knot or link with k 
crossings. Then K can be colored mod n, for some n > 1. 


Before we prove the theorem we need to define a matrix. 


Definition 3.4. Let C*_ be a matrix obtained from a submatrix C* to C in (**), 
for an alternating knot or link with k crossings, by replacing an element in a 
column with one non-zero element in a row with only two non-zero elements with 
1, and the rest of the elements in its column or row are replaced with zeroes. The 
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replaced element is supposed to be non-zero or to be in a row which has a 
non-zero element outside C“. 


Proof: (Theorem 3.3) Suppose K is a knot. We get two cases. 
Case 1: C* symmetric. Suppose the following holds for each C*~! and C*~?: 


Ck-!=k-1,C*-? =k — 2. This is true for k = 4. 
Develop |C*| by a row with only one —1. 


rrr rrr 9) a. 
0-2 5 5 
1 D 2 —] 
7 2 0 2 
0 2 
2 . 
- 2 


= 2|C*~1| - |Ck-7| = 2(k — 1) — (k - 2) =k. 

It does not matter in which position the —1 is, the sign is as above. By Cramer’s 
rule (**) has an integer solution for n = k. If we take b in (**) with one 1 in the 
row we develop and the rest zeroes, we see that the colors k — 1 and k — 2 are 
represented in the solution. Induction gives the theorem for knots with sym- 
metric C. 


Case 2: C not symmetric. Suppose |C*~'| > k — 1, |Cf7?| > k — 2 and |C*7'|, 
ICf *| > 0. This holds for k = 4. As above we develop by a row with only two 
non-zero elements: 


2: -1 + -1 
- 2 
ICk| = —1 | =2|A| + |B] =2|A| — |A| + |A,| 
cana 3 
= |A| + |A,]. 
Addition of the columns in B gives |B| = —|A| + |A,| where A, is defined 


analogously with Definition 3.4, with 1 in the same row as —1 outside the 
determinant. 
For different a we have two cases: 


Gi) a =0: |C£| = |Ck-3] + [CEU] + [CK > kk -14+141=k +1. 
Gi) a = -1: |C£| = Ce!) + ICR > k-14+1=k. 


Now we need to show |C{_| > 0. We have three cases with 3 to 5 elements 
deleted during development. As above we end up with a sum of positive terms. 
Take b as in Case 1. We find that the determinant is bigger when D is in the 
column of the 2 than when b is in the column of the —1. There is more than one 
color in this solution and induction gives the theorem for knots. 

The arguments above work equally well for alternating links except if we have a 
simple loop attached to an arc. In this case we simply give this loop the same color 


446 NOTES [May 


a n even 


n/2 


Simple loop Hopf link 


Figure 6. 


as the arc and remove it from C. It is easy to find that two simple loops attached to 
each other (the Hopf link) can be colored mod n, n even. (figure 6) In separated 
links the columns in the sub-determinants could be linearly dependent and 
det(C_) = 0. If K is separated apply Corollary 2.5. O 


Corollary 3.5. Alternating knots cannot be deformed to an unknotted curve. 
Proof: Follows directly from Theorem 3.3 and Corollary 2.3. O 


This is a new and elementary proof of a well known result which also follows 
from the behavior of the Jones polynomial. See [3]. 


Example. The knot 5, (Figure 7). C is symmetric and the knot can be colored 
mod 5. 


Example. The figure eight knot with four crossings. (Figure 8) 
2 —-l 0 -1l 
0 2 -l1 -1 


C=l_y _4 2 0 
-1 0 -1 2 
2 -1 #O 

IC*| =| 0 2 -1/= ~ 3) +|_? = 
-1 -1 2 : 


2 -1 1 -1 
-| + | 5) = fick + Ich} = 3 +2=5, 
Take b_= (1,0,0)? > X1 = 3, X2 =1, X3 = 2, X4=0. 


1 


ised 
w. 


Figure 7. The knot 5,. 
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Figure 8. The figure eight knot. 


In both examples above the knots can be colored mod5 and they cannot be 
unknotted. 


4. GENERALIZATIONS OF THE COLOR INVARIANT. The color invariant can 
be generalized if we orient the curve and the color relations (*) are replaced by: 
x*m, + y*m, = (x + y)*m,(mod n), (***) 


where m, is the integer associated with the outgoing arc in a (+ )-crossing or the 
incoming arc in a (—)-crossing and m, with the other undergoing arc, m, is 
associated with the overcrossing arc and x, y,n are integers. (figure 9) 


Figure 9. A (+ )-crossing to the left and a (— )-crossing to the right. 


If n is a prime, i.e., we have a field, n is an invariant as above. The proof of this 
is a straightforward generalization of the case x = y = 1 and is left as an exercise. 


Example. With x = 1 and y = 2 we find that for the figure eight knot n = 11 and 
for the knot 5, n = 31. So the knots cannot be deformed to each other. 

If we put tf = —x/y in (***) and calculate det(C_), where C_ is a submatrix to 
C in (**), we formally get the classical Alexander polynomial (see [4], pp. 
201-203). 
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Catherine C. McGeoch 


How do you invent a new algorithm for a computational problem? The first source 
of inspiration is humankind, of course: to develop a sorting algorithm, think about 
how you would sort a deck of playing cards, or a stack of 200 student papers, and 
try to write that process down formally. But introspection is not enough. The 
greatest algorithmic discoveries represent surprising departures from the usual way 
of doing things. 

One of the early landmark events in computer science was Volker Strassen’s 
1968 discovery that two n Xn matrices could be multiplied using fewer than n° 
scalar multiplications. His algorithm uses 7n'°22’ — 6n? scalar arithmetic opera- 
tions where log, 7 is about 2.808. 

Strassen’s algorithm is an example of the divide-and-conquer paradigm: to solve 
a problem efficiently, divide it into independent subproblems, recursively solve the 
subproblems, and recombine the subproblem solutions. Computer scientists have 
come to recognize about a half-dozen algorithm paradigms, which can guide the 
search for new algorithms much in the way that Polya’s heuristic strategies 
(Analogy, Decomposition, Generalization, Induction, etc.) can guide the search for 
new mathematical results [3]. This column will present Strassen’s method and a 
general technique for analyzing divide-and-conquer algorithms. 


THE WAY WE’VE ALWAYS DONE IT. Let X and Y be two n X n matrices and 
let Z be their product. The entry in the ith row and jth column of X is denoted 
x;;, and similarly for Y and Z. The product Z is defined by 2;; = Loi Xin * Vx;- 
The usual method for calculating Z could be written down as follows. 


[1] Set z,, = 0 for all pairs (i, j) 
[2] For i ranging from 1 to n, and 


[3] For j ranging from 1 to n, and 
[4] For k ranging from 1 to n, do 
[5] Add xi, ° Y,; tO 2; 


To analyze this method we will count the total number of scalar arithmetic 
operations. Line [5] is performed n° times, once for each distinct triple (i, j, k), 
and it contains one scalar multiply and one scalar add. Therefore the total number 
of arithmetic operations is 2n°. Note that by changing line [1] to z, j = Xin Vij, we 
could let A go from 2 to n and save one addition per entry of Z. Then the total 
number of operations would be 2? — n?. 


*1 came, I divided, I conquered. 
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DIVIDING WITHOUT CONQUERING. Before describing Strassen’s method we 
will develop a simpler divide-and-conquer strategy. Actually this strategy should be 
called divide-and-be-conquered because it will turn out to be worse than the classic 
method; but it will serve to introduce some useful notation and the analysis 
technique. 

Assume for now that 1 is a power of 2, so n = 2* for some k > 0. The n Xn 
matrix Z can be divided into four n/2 x n/2 matrices Z,,, Z,, Z>, and Z,, 
located in the upper left, upper right, lower left, and lower right corners respec- 
tively. We will call these smaller matrices the quadrants of Z. With X and Y 
divided similarly, it holds that 


21, =X Vy + XQ ° 
Z12 =X Yin +t Xq2° 
Zo, = Xq, °° Yq + Xy* Yo 
Zo. = Xq) °° Vig + XyQ° Yop. (1) 
Thus for n a power of 2 we can obtain the product of X and Y recursively, 
using eight multiplications and four additions of quadrant matrices. When n = 1 


the single entry of Z is computed by scalar multiplication. This recursive definition 
suggests a procedure for matrix multiplication that is sketched below. 


[0] Procedure Multiply CX, Y, Z,n) calculates Z as follows: 
[1] If m = 1 then let z,, =x,, ‘y,, and exit the procedure. 
[2] Otherwise, do the following: 


[3] Divide X into quadrants X,,, X1., X51, Xp. 

[4] Divide Y into quadrants Y,,, Y;5, Y5,, Y.- 

[5] Apply Multiply (X,,, Y;,, 7), 1/2) with result T,. 

[6] Apply Multiply CX,,, Y5,, T>, 1/2) with result T,. 

[7..12] Apply Multiply six more times with results 7; ... Ty. 

[13] Add pairs of matrices from T, ...T, as in (1) to obtain Z,,, Z,, Z3; 
and Z,. 

[14] Combine quadrants to form Z. 

[15] Exit the procedure. 


This definition deserves some explanation for those unfamiliar with computer 
programming. The Multiply procedure can be applied to matrices X, Y, and Z of 
arbitrary size n = 2"; therefore we write it down in terms of parameters 
(X,Y, Z, n). The general multiplication process involves several separate instantia- 
tions, or copies, of the procedure. For a given instantiation, the Apply operation on 
Line [5] creates a new instantiation of Multiply and sets up a correspondence 
between (X,,, Y,;, T,, n/2), and the parameters CX, Y, Z, n), respectively. The new 
instantiation performs Multiply on the smaller matrices, producing 7,. Another 
instantiation is created on line [6] to calculate T,, and so forth. 

To analyze this algorithm we use a recurrence formula 7T(n) that describes the 
total number of scalar operations required to multiply two n X n matrices. When 
n = 1 we have T(1) = 1. For n > 1, line [13] performs four matrix additions on 
n/2 X n/2 matrices, for a total of n? scalar additions. The Apply operation occurs 
eight times, each time instantiating a procedure for n/2 X n/2 matrices. There- 
fore the recurrence is given by 


T(1) =1 
T(n) = 8T(n/2) +n? forn=2*,k>1. 
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Now let’s derive a solution to the above recurrence. We will use the facts that 
T(1) = 1 and T(@) = 12. It is convenient to work with a new formula defined by 
t(k) = T(2*); therefore we have 

t(k) = 8t(k —1) + 4* fork >1. 
We need to solve 
t(k) — 8t(k -— 1) =4*, (2) 
subject to (0) = 1, t(1) = 12. Multiply throughout by 4 and substitute k — 1 for k 
to obtain 
4t(k — 1) — 32t(k — 2) =4* fork > 1. 
Subtracting this from (2), we have 
t(k) — 12t(k — 1) + 32t(k — 2) =0 fork > 1. (3) 
We can solve this using a characteristic equation that maps the coefficients of the 
recurrence into a polynomial in x: 
x* — 12x 4+ 32=0. 
This polynomial has roots x = 8 and x = 4. It is easy to verify that the following 
formula satisfies (3): 
t(k) =c,(8*) +c,(4*) fork > 1. 
The coefficients c, and c, are determined by initial conditions: here we have 
t(0) = 1 =c, +c, and ¢t(1) = 12 = 8c, + 4c,, which produces c, = 2 and c, = 
—1. Returning to the original notation we have 


t(k) =2:+ 8* — 4* fork > 0 
T(n) =2- le" — gle” for n = 2%, k > 0 
T(n) = 2n? — n? forn =2*,k>0. 


When n is a power of two this recursive procedure requires exactly the same 
number of scalar operations as the standard method. If n is not a power of two we 
can find the smallest m = 2* such that n <_m, and imbed X and Y in larger 
matrices of size m X m, padding with 0’s as needed. In this case the recursive 
procedure is worse than the original method! 


Strassen’s Algorithm. Strassen’s matrix multiplication algorithm is similar to the 
recursive strategy above, but it does not calculate the intermediate terms T;, ... Ts. 
Instead it finds different intermediate terms M,...M,, defined by 


M, = (Xj. — X22) + (Yo + Yo) 
M, = (Xi, + Xy) * (Yu + Yo) 
Mz = (X1, — X21) * (Yn + Viz) 
M, = (Xi, + Xz) * Yn 

Ms = Xi, ° (V12 — Yn) 

Me = Xx ° (Yo, — Yur) 

M, = (Xy, + X92) ° Yuy. 

It is straightforward to verify that 
Z,,=M,+M,-M,+M, 
Zi2=M,+ M, 
Zy,=Ms+M, 
Zo, = M, -M,+M, — M.. 
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Let S(n) denote the total number of scalar arithmetic operations performed by 
Strassen’s algorithm. We have S(1) = 1 as before. A given instantiation with 
parameter n performs eighteen matrix additions and subtractions on quadrants, 
for a total of 18n?/4 = 9n*/2 scalar operations. Also, seven Apply steps are 
needed to perform matrix multiplication on quadrants. Therefore we have S(n) = 
78(n/2) | 9n?/2. 

The key observation here is that Strassen’s algorithm performs only seven 
matrix multiplications per instantiation rather than eight. Proceeding with the 
analysis, we must solve the relation s(k) — 7s(k — 1) = (9/2)4* or equivalently 
s(k) — 11s(k — 1) + 28s(k — 2) = 0. With the recurrence in this form, we can 
apply the characteristic equation x* — 11x + 28 = 0 which has roots x = 7 and 
x = 4. Therefore we have s(k) = c,7* + c,4*, with coefficients determined by 
initial conditions s(0) = 1 and s(1) = 25. The result is s(k) = 7: 7* — 6: 4*, so 
we obtain 


S(n) = 7]- Tiog2n —6- Alogszn 
S(n) = 7n'827 — 6n?, for n = 2", k > 0 


where log, 7 = 2.808. 

The smallest n for which 7n?8° — 6n? > 2n> — n? holds is 668. If you ever 
need to multiply two 1024 x 1024 matrices you could save about 166 million scalar 
operations (with only about 1.98 billion remaining) by using Strassen’s algorithm 
instead of the usual method. Although the algorithm can be implemented so that 
no actual costs are incurred in lines [3], [4], and [14], the costs of instantiating new 
procedures and of handling cases where n is not a power of 2 combine to make 
Strassen’s algorithm more of theoretical than practical interest. 

Strassen’s discovery prompted an intensive worldwide search for even better 
matrix multiplication algorithms. The first improvement came ten years later, when 
Victor Pan showed that the leading exponent could be lowered to 2.795 (see [2] for 
a discussion of research on this problem up to 1982). Currently the best known 
algorithm has an exponent of 2.376 [1]. It is known that the exponent must be least 
2; obtaining either a better multiplication algorithm or a higher lower bound 
remains one of the most famous open problems in algorithm analysis. 


Addendum. Try to invent a divide-and-conquer algorithm for sorting n numbers. 
At least two such algorithms are known; you can read about Quicksort and 
Mergesort in any textbook on algorithms. 


ACKNOWLEDGMENT. I thank David Armacost for his help with the title. 
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THE EVOLUTION OF... 
Edited by Abe Shenitzer 
Mathematics, York University, North York, Ontario M3J 1P3, Canada 


Part I. Topology and Abstract Algebra 
as Two Roads 
of Mathematical Comprehension* 
Unterrichtsblatter fiir Mathematik und Naturwissenschaften 38, 177-188 (1932). (A lecture in the 


summer course of the Swiss Society of Gymnasium Teachers, given in Bern, in October 1931.) 


Hermann Weyl 


We are not very pleased when we are forced to accept a mathematical truth by 
virtue of a complicated chain of formal conclusions and computations, which we 
traverse blindly, link by link, feeling our way by touch. We want first an overview of 
the aim and of the road; we want to understand the idea of the proof, the deeper 
context. A modern mathematical proof is not very different from a modern 
machine, or a modern test setup: the simple fundamental principles are hidden 
and almost invisible under a mass of technical details. When discussing Riemann 
in his lectures on the history of mathematics in the 19th century, Felix Klein said: 


Undoubtedly, the capstone of every mathematical theory is a convincing proof of all of its 
assertions. Undoubtedly, mathematics inculpates itself when it foregoes convincing proofs. But 
the mystery of brilliant productivity will always be the posing of new questions, the anticipation 
of new theorems that make accessible valuable results and connections. Without the creation of 
new viewpoints, without the statement of new aims, mathematics would soon exhaust itself in the 
rigor of its logical proofs and begin to stagnate as its substance vanishes. Thus, in a sense, 
mathematics has been most advanced by those who distinguished themselves by intuition rather 
than by rigorous proofs. 


The key element of Klein’s own method was an intuitive perception of inner 
connections and relations whose foundations are scattered. To some extent, he 
failed when it came to a concentrated and pointed logical effort. In his commemo- 
rative address for Dirichlet, Minkowski contrasted the minimum principle that 
Germans tend to name for Dirichlet (and that was actually applied most compre- 
hensively by William Thomson) with the true Dirichlet principle: to conquer 
problems with a minimum of blind computation and a maximum of insightful 
thoughts. It was Dirichlet, said Minkowski, who ushered in the new era in the 
history of mathematics. 

What is the secret of such an understanding of mathematical matters, what does 
it consist in? Recently, there have been attempts in the philosophy of science to 
contrast understanding, the art of interpretation as the basis of the humanities, 
with scientific explanation, and the words intuition and understanding have been 


*The original German version of this article is found in vol. 3, pp. 348-358, of the four-volume 
edition of Hermann Weyl’s collected works published by Springer-Verlag in 1968. The translation is by 
Abe Shenitzer. 
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invested in this philosophy with a certain mystical halo, an intrinsic depth and 
immediacy. In mathematics, we prefer to look at things somewhat more soberly. I 
cannot enter into these matters here, and it strikes me as very difficult to give a 
precise analysis of the relevant mental acts. But at least I can single out, from the 
many characteristics of the process of understanding, one that is of decisive 
importance. One separates in a natural way the different aspects of a subject of 
mathematical investigation, makes each accessible through its own relatively nar- 
row and easily surveyable group of assumptions, and returns to the complex whole 
by combining the appropriately specialized partial results. This last synthetic step 
is purely mechanical. The great art is in the first, analytic, step of appropriate 
separation and generalization. The mathematics of the last few decades has 
revelled in generalizations and formalizations. But to think that mathematics 
pursues generality for the sake of generality is to misunderstand the sound truth 
that a natural generalization simplifies by reducing the number of assumptions and 
by thus letting us understand certain aspects of a disarranged whole. Of course, it 
can happen that different directions of generalization enable us to understand 
different aspects of a particular concrete issue. Then it is subjective and dogmatic 
arbitrariness to speak of the true ground, the true source of an issue. Perhaps the 
only criterion of the naturalness of a severance and an associated generalization is 
their fruitfulness. If this process is systematized according to subject matter by a 
researcher with a measure of skill and ‘‘sensitive fingertips” who relies on all the 
analogies derived from his experience, then we arrive at axiomatics, which today is 
an instrument of concrete mathematical investigation rather than a method for the 
clarification and “‘deep-laying” of foundations. 

In recent years mathematicians have had to focus on the general and on 
formalization to such an extent that, predictably, there have turned up many 
instances of cheap and easy generalizing for its own sake. Pélya has called it 
generalizing by dilution. It does not increase the essential mathematical substance. 
It is much like stretching a meal by thinning the soup. It is deterioration rather 
than improvement. The aged Klein said: ‘‘Mathematics looks to me like a store 
that sells weapons in peacetime. Its windows are replete with luxury items whose 
ingenious, artful and eyecatching execution delights the connoisseur. The true 
origin and purpose of these objects—the strike that defeats the enemy—have 
receded into the background and have been all but forgotten.” There is perhaps 
more than a grain of truth in this indictment, but, on the whole, our generation 
regards this evaluation of its efforts as unjust. 

There are two modes of understanding that have proved, in our time, to be 
especially penetrating and fruitful. The two are topology and abstract algebra. A 
large part of mathematics bears the imprint of these two modes of thought. What 
this is attributable to can be made plausible at the outset by considering the 
central concept of real number. The system of real numbers is like a Janus head 
with two oppositely directed faces. In one respect it is the domain of the 
Operations + and X and their inverses, in another it is a continuous manifold, and 
the two are continuously related. One is the algebraic and the other is the 
topological face of numbers. Since modern axiomatics is simpleminded and (unlike 
modern politics) dislikes such ambiguous mixtures of peace and war, it made a 
clean break between the two. The notion of size of number, expressed in the 
relations < and >, occupies a kind of intermediate relation between algebra 
and topology. 

Investigations of continua are purely topological if they are restricted to just 
those properties and differences that are unchanged by arbitrary continuous 
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deformations, by arbitrary continuous mappings. The mappings in question need 
only be faithful to the extent to which they don’t collapse what is distinct. Thus it is 
a topological property of a surface to be closed like the surface of a sphere or open 
like the ordinary plane. A piece of the plane is said to be simply connected if, like 
the interior of a circle, it is partitioned by every crosscut. On the other hand, an 
annulus is doubly connected because there exists a crosscut that does not partition 
it but every subsequent crosscut does. Every closed curve on the surface of a 
sphere can be shrunk to a point by means of a continuous deformation, but this is 
not the case for a torus. Two closed curves in space can be intertwined or not. 
These are examples of topological properties or dispositions. They involve the 
primitive differences that underlie all finer differentiations of geometric figures. 
They are based on the single idea of continuous connection. References to a 
particular structure of a continuous manifold, such as a metric, are foreign to 
them. Other relevant concepts are limit, convergence of a sequence of points to a 
point, neighborhood and continuous line. 

After this preliminary sketch of topology I want to tell you briefly about the 
motives that have led to the development of abstract algebra. Then I will use a 
simple example to show how the same issue can be looked at from a topological 
and from an abstract-algebraic viewpoint. 

All a pure algebraist can do with numbers is apply to them the four operations 
of addition, subtraction, multiplication and division. If a system of numbers is a 
field, that is, if it is closed under these operations, then the algebraist has no 
means of going beyond it. The simplest field is the field of rationals. Another 
example is the field of numbers of the form a + by2, a, b rational. The well-known 
concept of irreducibility of polynomials is relative and depends on the field of 
coefficients of the polynomials, namely a polynomial f(x) with coefficients in a 
field K is said to be irreducible over K if it cannot be written as a product 
f(x) - f,(x) of two non-constant polynomials with coefficients in K. The solution 
of linear equations and the determination of the greatest common divisor of two 
polynomials by means of the Euclidean algorithm are carried out within the field 
of the coefficients of the equations and of the polynomials respectively. The 
classical problem of algebra is the solution of an algebraic equation f(x) = 0 with 
coefficients in a field K, say the field of rationals. If we know a root 0 of the 
equation, then we know the numbers obtained by applying to 3 and to the 
(presumably known) numbers in K the four algebraic operations. The resulting 
numbers form a field K(0%) that contains K. In K(%), 3 plays a role of a 
determining number from which all other numbers in K(%) are rationally deriv- 
able. But many—virtually all—numbers in K(3) can play the same role as @. It is 
therefore a breakthrough if we replace the study of the equation f(x) = 0 by the 
study of the field K(3%). By doing this we eliminate all manner of trivia and 
consider at the same time all equations that can be obtained from f(x) = 0 by 
means of Tschirnhausen transformations. The algebraic, and above all the arith- 
metical, theory of number fields is one of the sublime creations of mathematics. 
From the viewpoint of the richness and depth of its results it is the most perfect 
such creation. 

There are fields in algebra whose elements are not numbers. The polynomials in 
one variable, or indeterminate, x, [with coefficients in a field], are closed under 
addition, subtraction and multiplication but not under division. Such a system of 
magnitudes is called an integral domain. The idea that the argument x is a 
variable that traverses continuously its values is foreign to algebra; it is just an 
indeterminate, an empty symbol that binds the coefficients of the polynomial into a 
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uniform expression that makes it easier to remember the rules for addition and 
multiplication. 0 is the polynomial all of whose coefficients are 0 (not the 
polynomial which takes on the value 0 for all values of the variable x). It can be 
shown that the product of two nonzero polynomials is # 0. The algebraic view- 
point does not rule out the substitution for x of a number a taken from the field in 
which we operate. But we can also substitute for x a polynomial in one or more 
indeterminates y,z.... Such substitution is a formal process which effects a 
faithful projection of the integral domain K[x] of polynomials in x onto K or onto 
the integral domain of polynomials K[y, z,...]; here “faithful” means subject to 
the preservation of the relations established by addition and multiplication. It is 
this formal operating with polynomials that we are required to teach students 
studying algebra in school. If we form quotients of polynomials, then we obtain a 
field of rational functions which must be treated in the same formal manner. This, 
then, is a field whose elements are functions rather than numbers. Similarly, the 
polynomials and rational functions in two or three variables, x, y or x, y, z with 
coefficients in K form an integral domain and field respectively. 

Compare the following three integral domains: the integers, the polynomials in 

x with rational coefficients, and the polynomials in x and y with rational 
coefficients. The Euclidean algorithm holds in the first two of these domains, and 
so we have the theorem: If a, b are two relatively prime elements, then there are 
elements p,g in the appropriate domain such that 
(*) l=p:acq:sb. 
This implies that the two domains in question are unique factorization domains. 
The theorem (*) fails for polynomials in two variables. For example, x — y and 
x + y are relatively prime polynomials such that for every choice of polynomials 
p(x,y) and q(x, y) the constant term of the polynomial p(x, yx —y) + 
q(x, yx + y) is 0 rather than 1. Nevertheless polynomials in two variables with 
coefficients in a field form a unique factorization domain. This example points to 
interesting similarities and differences. 

There is yet another way of making fields in algebra. It involves neither numbers 
nor functions but congruences. Let p be a prime integer. Identify two integers if 
their difference is divisible by p, or, briefly, if they are congruent mod p. (To “see” 
what this means wrap the real line around a circle of circumference p.) The result 
is a field with p elements. This representation is extremely useful in all of number 
theory. Consider, for example, the following theorem of Gauss that has numerous 
applications: If f(x) and g(x) are two polynomials with integer coefficients such 
that all coefficients of the product f(x) - g(x) are divisible by a prime p, then all 
coefficients of f(x) or all coefficients of g(x) are divisible by p. This is just the 
trivial theorem that the product of two polynomials can be 0 only if one of its 
factors is 0, applied to the field just described as the field of coefficients. This 
integral domain contains polynomials that are not 0 but vanish for all values of the 
argument; one such polynomial is x” — x. In fact, by Fermat’s theorem, we have 

a? —-a=0 (mod p). 
Cauchy uses a similar approach to construct the complex numbers. He regards the 
imaginary unit 7 as an indeterminate and studies polynomials in i over the reals 
modulo i* + 1, that is he regards two polynomials as equal if their difference is 
divisible by i + 1. In this way, the actually unsolvable equation i* + 1 = 0 is 
rendered, in some measure, solvable. Note that the polynomial i* + 1 is prime 
over the reals. Kronecker generalized Cauchy’s construction as follows. Let K bea 
field and p(x) a polynomial prime over K. Viewed modulo p(x), the polynomials 
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f(x) with coefficients in K form a field (and not just an integral domain). From an 
algebraic viewpoint, this process is fully equivalent to the one described previously, 
and can be thought of as the process of extending K to K(%) by adjoining to K a 
root of the equation p(3) = 0. But it has the advantage that it takes place within 
pure algebra and gets around the demand for solving an equation that is actually 
unsolvable over K. 

It is quite natural that these developments should have prompted a purely 
axiomatic buildup of algebra. A field is a system of objects, called numbers, closed 
under two operations, called addition and multiplication, that satisfy the usual 
axioms: both operations are associative and commutative, multiplication is dis- 
tributive over addition, and both operations are uniquely invertible yielding sub- 
traction and division respectively. If the axiom of invertibility of multiplication is 
left out, then the resulting system is called a ring. Now “field” no longer denotes, 
as before, a kind of sector of the continuum of real or complex numbers but a 
self-contained universe. One can apply the field operations to elements of the 
same field but not to elements of different fields. In this process we need not 
resort to artificial abstracting from the size relations < and >. These relations 
are irrelevant for algebra and the “‘numbers”’ of an abstract ‘“‘number field” are not 
subject to such relations. In place of the uniform number continuum of analysis we 
now have the infinite multiplicity of structurally different fields. The previously 
described processes, namely adjunction of an indeterminate and identification of 
elements that are congruent with respect to a fixed prime element, are now seen as 
two modes of construction that lead from rings and fields to other rings and fields 
respectively. 

The elementary axiomatic grounding of geometry also leads to this abstract 
number concept. Take the case of plane projective geometry. The incidence axioms 
alone lead to a “number field” that is naturally associated with it. Its elements, the 
“numbers,” are purely geometric entities, namely dilations. A point and a straight 
line are ratios of triples of “numbers” in that field, x,:x,:x, and u,:u,:u, 
respectively, such that incidence of the point x, :x,:x, on the line u,:u,:u, is 
represented by the equation 


XU, +X,U, + x3U, = 0. 


Conversely, if one uses these algebraic expressions to define the geometric terms, 
then every abstract field leads to an associated projective plane that satisfies the 
incidence axioms. It follows that a restriction involving the number field associated 
with the projective plane cannot be read off from the incidence axioms. Here the 
preexisting harmony between geometry and algebra comes to light in the most 
impressive manner. For the geometric number system to coincide with the contin- 
uum of ordinary real numbers one must introduce axioms of order and continuity, 
very different in kind from the incidence axioms. We thus arrive at a reversal of 
the development that has dominated mathematics for centuries and seems to have 
arisen originally in India and to have been transmitted to the West by Arab 
scholars: Up till now, we have regarded the number concept as the logical 
antecedent of geometry, and have therefore approached every realm of magni- 
tudes with a universal and systematically developed number concept independent 
of the applications involved. Now, however, we revert to the Greek viewpoint that 
every subject has an associated intrinsic number realm that must be derived from 
within it. We experience this reversal not only in geometry but also in the new 
quantum physics. According to quantum physics, the physical magnitudes associ- 
ated with a particular physical setup (not the numerical values that they may take 
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on depending on its different states) admit of an addition and a non-commutative 
multiplication, and thus give rise to a system of algebraic magnitudes intrinsic to it 
that cannot be viewed as a sector of the system of real numbers. 

And now, as promised, I will present a simple example that illustrates the 
mutual relation between the topological and abstract-algebraic modes of analysis. I 
consider the theory of algebraic functions of a single variable x. Let K(x) be the 
field of rational functions of x with arbitrary complex coefficients. Let f(z), more 
precisely f(z; x), be an n-th degree polynomial in z with coefficients in K(x). We 
explained earlier when such a polynomial is said to be irreducible over K(x). This 
is a purely algebraic concept. Now construct the Riemann surface of the n-valued 
algebraic function z(x) determined by the equation f(z;x)=0. Its n sheets 
extend over the x-plane. For easier transformation of the x-plane into the 
x-sphere by means of a stereographic projection we add to the x-plane a point at 
infinity. Like the sphere, our Riemann surface is now closed. The irreducibility of 
the polynomial f is reflected in a very simple topological property of the Riemann 
surface of z(x), namely its connectedness: if we shake a paper model of that 
surface it does not break into distinct pieces. Here you witness the coincidence of a 
purely algebraic and a purely topological concept. Each suggests generalization in 
a different direction. The algebraic concept of irreducibility depends only on the 
fact that the coefficients of the polynomial are in a field. In particular, K(x) can 
be replaced by the field of rational functions of x with coefficients in a preas- 
signed field k which takes the place of the continuum of all complex numbers. On 
the other hand, from the viewpoint of topology it is irrelevant that the surface in 
question is a Riemann surface, that it is equipped with a conformal structure, and 
that it consists of a finite number of sheets that extend over the x-plane. Each of 
the two antagonists can accuse the other of admitting side issues and of neglecting 
essential features. Who is right? Questions such as these, involving not facts but 
ways of looking at facts, can lead to hatred and bloodshed when they touch human 
emotions. In mathematics, the consequences are not so serious. Nevertheless, the 
contrast between Riemann’s topological theory of algebraic functions and Weier- 
strass’ more algebraically directed school led to a split in the ranks of mathemati- 
cians that lasted for almost a generation. 

Weierstrass himself wrote to his faithful pupil H. A. Schwarz: “The more I 
reflect on the principles of function theory—and I do this all the time—the 
stronger is my conviction that this theory must be established on the foundation of 
algebraic truths, and that it is therefore not the right way when, contrariwise, the 
‘transcendent’ (to put it briefly) is invoked to establish simple and fundamental 
algebraic theorems—this is so no matter how attractive are, at a first glance, say, 
the considerations by means of which Riemann discovered so many of the most 
important properties of algebraic functions.” This strikes us now as onesided; 
neither one of the two ways of understanding, the topological or the algebraic, can 
be acknowledged to have unconditional advantage over the other. And we cannot 
spare Weierstrass the reproach that he stopped midway. True, he explicitly 
constructed the functions as algebraic, but he also used as coefficients the 
algebraically unanalyzed, and in a sense unfathomable for algebraists, continuum 
of complex numbers. The dominant general theory in the direction followed by 
Weierstrass is the theory of an abstract number field and its extensions determined 
by means of algebraic equations. Then the theory of algebraic functions moves in 
the direction of a shared axiomatic basis with the theory of algebraic numbers. In 
fact, what suggested to Hilbert his approaches in the theory of number fields was 
the analogy [between the latter] and the state of things in the realm of algebraic 
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functions discovered by Riemann by his topological methods. (Of course, when it 
came to proofs, the analogy was useless.) 

Our example ‘“‘irreducible-connected” is typical also in another respect. How 
visually simple and understandable is the topological criterion (shake the paper 
model and see if it falls apart) in comparison with the algebraic! The visual 
primality of the continuum (I think that in this respect it is superior to the 1 and 
the natural numbers) makes the topological method particularly suitable for both 
discovery and synopsis in mathematical areas, but is also the cause of difficulties 
when it comes to rigorous proofs. While it is close to the visual, it is also refractory 
to logical approaches. That is why Weierstrass, M. Noether and others preferred 
the laborious, but more solid-feeling, procedure of direct algebraic construction to 
Riemann’s transcendental-topological justification. Now, step by step, abstract 
algebra tidies up the clumsy computational apparatus. The generality of the 
assumptions and axiomatization force one to abandon the path of blind computa- 
tion and to break the complex state of affairs into simple parts that can be handled 
by means of simple reasoning. Thus algebra turns out to be the El Dorado of 
axiomatics. 

I must add a few words about the method of topology to prevent the picture 
from becoming altogether vague. If a continuum, say, a two-dimensional closed 
manifold, a surface, is to be the subject of mathematical investigation, then we 
must think of it as being subdivided into finitely many “‘elementary pieces”? whose 
topological nature is that of a circular disk. These pieces are further fragmented by 
repeated subdivision in accordance with a fixed scheme, and thus a particular spot 
in the continuum is ever more precisely intercepted by an infinite sequence of 
nested fragments that arise in the course of successive subdivisions. In the 
one-dimensional case, the repeated “normal subdivision” of an elementary seg- 
ment is its bipartition. In the two-dimensional case, each edge is first bipartitioned, 
then each piece of surface is divided into triangles by means of lines in the surface 
that lead from an arbitrary center to the (old and new) vertices. What proves that a 
piece is elementary is that it can be broken into arbitrarily small pieces by 
repetition of this division process. The scheme of the initial subdivision into 
elementary pieces—to be referred to briefly in what follows as the “skeleton” —is 
best described by labelling the surface pieces, edges and vertices by means of 
symbols, and thus prescribing the mutual bounding relations of these elements. 
Following the successive subdivisions, the manifold may be said to be spanned by 
an increasingly dense net of coordinates which makes it possible to determine a 
particular point by means of an infinite sequence of symbols that play a role 
comparable to that of numbers. The reals appear here in the particular form of 
dyadic fractions, and serve to describe the subdivision of an open one-dimensional 
continuum. Other than that, we can say that each continuum has its own arithmeti- 
cal scheme; the introduction of numerical coordinates by reference to the special 
division scheme of an open one-dimensional continuum violates the nature of 
things, and its sole justification is the practical one of the extraordinary conve- 
nience of the calculational manipulation of the continuum of numbers with its four 
operations. In the case of an actual continuum, the subdivisions can be realized 
only with a measure of imprecision; one must imagine that, as the process of 
subdivision progresses step by step, the boundaries set by the earlier subdivisions 
are ever more sharply fixed. Also, in the case of an actual continuum, the process 
of subdivision that runs virtually ad infinitum can reach only a certain definite 
stage. But in distinction to concrete realization, the localization in an actual 
continuum, the combinatorial scheme, the arithmetical nullform, is a priori deter- 
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mined ad infinitum; and mathematics deals with this combinatorial scheme alone. 
Since the continued subdivision of the initial topological skeleton progresses in 
accordance with a fixed scheme, it must be possible to read off all the topological 
properties of the nascent manifold from that skeleton. This means that, in 
principle, it must be possible to pursue topology as finite combinatorics. For 
topology, the ultimate elements, the atoms, are, in a sense, the elementary parts of 
the skeleton and not the points of the relevant continuous manifold. In particular, 
given two such skeletons, it must be possible to decide if they lead to concurrent 
manifolds. Put differently, it must be possible to decide if we can view them as 
subdivisions of one and the same manifold. 

The algebraic counterpart of the transition from the algebraic equation f(z; x) 
= 0 to the Riemann surface is the transition from the latter equation to the field 
determined by the function z(x); this is so because the Riemann surface is 
uniquely occupied not only by the function z(x) but also by all algebraic functions 
in this field. What is characteristic for Riemann’s function theory is the converse 
problem: given a Riemann surface construct its field of algebraic functions. The 
problem has always just one solution. Since every point g of the Riemann surface 
lies over a definite point of the x-plane, the Riemann surface, as presently 
constituted, is embedded in the x-plane. The next step is to abstract from the 
embedding relation g — x. As a result, the Riemann surface becomes, so to say, a 
free-floating surface equipped with a conformal structure and an angle measure. 
Note that in ordinary surface theory we must learn to distinguish between the 
surface as a continuous structure made up of elements of a specific kind, its points, 
and the embedding in 3-space that associates with each point g of the surface, in a 
continuous manner, the point P in space at which g is located. In the case of a 
Riemann surface, the only difference is that the Riemann surface and the embed- 
ding plane have the same dimension. To abstraction from the embedding there 
corresponds, on the algebraic side, the viewpoint of invariance under arbitrary 
birational transformations. To enter the realm of topology we must ignore the 
conformal structure associated with the free-floating Riemann surface. Continuing 
the comparison, we can say that the conformal structure of the Riemann surface is 
the equivalent of the metric structure of an ordinary surface, controlled by the first 
fundamental form, or of the affine and projective structures associated with 
surfaces in affine and projective differential geometry respectively. In the contin- 
uum of real numbers, it is the algebraic operations of + and - that reflect its 
structural aspect, and in a continuous group the law that associates with an 
ordered pair of elements their product plays an analogous role. These comments 
may have increased our appreciation of the relation of the methods. It is a 
question of rank, of what is viewed as primary. In topology we begin with the 
notion of continuous connection, and in the course of specialization we add, step 
by step, relevant structural features. In algebra this order is, in a sense, reversed. 
Algebra views the operations as the beginning of all mathematical thinking and 
admits continuity, or some algebraic surrogate of continuity, at the last step of 
specialization. The two methods follow opposite directions. Little wonder that they 
don’t get on well together. What is most easily accessible to one is often most 
deeply hidden to the other. In the last few years, in the theory of representation of 
continuous groups by means of linear substitutions, I have experienced most 
poignantly how difficult it is to serve these two masters at the same time. Such 
classical theories as that of algebraic functions can be made to fit both viewpoints. 
But viewed from these two viewpoints they present completely different sights. 
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PROBLEMS AND SOLUTIONS 


Edited by: 
Richard T. Bumby, Fred Kochman and Douglas B. West 


Proposed problems should be sent to the MONTHLY PROBLEMS address given on 
the inside front cover. Please include solutions, relevant references, etc. Three copies 
are requested. 


Solutions of published problems should arrive before October 31, 1995 at the MONTH- 
LY PROBLEMS address given on the inside front cover. Solutions should be typed 
with double spacing, including the problem number and the solver’s name and mailing 
address. Two copies suffice. A self-addressed postcard or label should be included if 
an acknowledgement Is desired. 


An asterisk (*) after the number of a problem, or part of a problem, indicates that no 
solution 1s currently available. Partial solutions will be useful in suchcases. Otherwise, 
the published solution is likely to be based on a solution which is complcte and 
correct. Of course, an elegant partial solution or a method leading to a more general 
result is always useful and welcome. In addition, references to other appearances 
of MONTHLY problems or to solutions of these problems in the literature are also 
solicited. 


PROBLEMS 


10452. Proposed by Seung-Jin Bang, Ajou University, Suwon, Korea. 

Find all values of n, k, a and b (n and k positive integers, n > k, a and b nonzero real 
numbers) for which the polynomial x” + ax + b is divisible by x* + ax + b in R[x]. 
10453. Proposed by Murray S. Klamkin, University of Alberta, Edmonton, Alberta, Canada. 


Prove that the following two properties of the altitudes of an n dimensional simplex are 
equivalent: 
i) the altitudes are concurrent; 
ii) the feet of the altitudes are the orthocenters of their respective faces. 
10454. Proposed by Harry Tamvakis (student), The University of Chicago, Chicago, IL. 
We say that a natural number n is amenable if there exist integers a1, a2, ..., A, such that 
Ay +2 ++++ +, = a102+++a, =N. 


Find all amenable natural numbers. 
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10455. Proposed by Zachary Franco, Texas A&M University, Kingsville, TX. 


It is easily seen that a parabola can intersect a circle in at most 4 points. 

(a) Show that there is a number R such that a regular polygon (of any number of sides) 
can intersect a parabola in at most R points. 

(b)* Find the smallest R with this property. 


10456. Proposed by Daniel B. Shapiro, Ohio State University, Columbus, OH. 


Denote the group of invertible n by n matrices with entries in the complex numbers by 
GL(n, C). Two such matrices f and g will be said to anticommute if fg = —gf. Also let 
I denote the identity matrix, which is the unit element of this group. 

(a) Ifn = 2” ng with no odd,.show that there are k elements of GL(n, C) that anticommute 
pairwise if and only if k < 2m + 1. 

(b) Ifn = 2” and fi,..., fom anticommute pairwise, show that the set of products 
Si, fig ++ fi, with 1 < ty < ... <i; < 2m forms a basis for the 22" dimensional space of 
all n by n matrices over C. Moreover, in this case each f? is a Scalar matrix. 


10457. Proposed by Henry Cohn (student), Massachusetts Institute of Technology, Cambridge, 
MA. 


Fion+1 


5 
) , where Fx denotes the k-th 
10n 


Determine the simple continued fraction of ( 


Fibonacci number. 


10458. Proposed by Frank Schmidt, Arlington, VA, and Louis W. Shapiro, Howard University, 
Washington, DC. 


For a finite group G, let cd(G) denote the multiset of irreducible character degrees, and 
let CD(G) denote the underlying set of distinct character degrees. 

(a) Find all n for which the multisetset {1, 2, ...,} appears as cd(G) for some group G. 

(b) Find an upper bound for those n for which the set {1, 2,...,} appears as CD(G) 
for some group G. 


NOTES 


(10457) To get started, here are the results form = 1: Fi9 = 55 and Fy, = 89, so 
F?, = 5584059449 and F?, = 503284375, and the continued fraction of their ratio is 
(11, 10, 1, 1, 2188, 1,17, 10, 1, 4, 11]. 


SOLUTIONS 


An Algebraic Polar Decomposition 


6668 [1991,767]. Proposed by Dragomir Z. Dokovié, University of Waterloo, Ontario, 
Canada. 


If n is even, show that there exists A € GL,(C) which cannot be represented in the 
form A = QS with Q € SO,(C) and S symmetric. (For every n it is known that every 
A € GL, (C) can be represented as A = QS with Q € On (C) and S symmetric.) 
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Solution by the proposer. Let M = (7 5): The minimal polynomial of M is g(A) = 


(A — 1)*, so its Jordan canonical form is (4 1): Let T be the n by n matrix formed from 


m copies (where n = 2m) of M along the diagonal and zeros elsewhere. Then, every 
eigenvalue of T is +1 and the Jordan canonical form of T has m 2-by-2 blocks. Now, let 
A = DT, where D = diag(—1, 1, 1,..., 1). Note that det(A) = —1. 

Suppose that A = QS with Q € O,(C) and S symmetric. Then S* = ATA = T?, 
so every eigenvalue of S is +1. Using the known Jordan canonical form of T, one sees 
that T also has Jordan blocks of only even size. This implies that the same is true of 
S. In particular, every eigenvalue of S has even multiplicity. Thus, det($) = +1, so that 
det(Q) = —1. Thus Q ¢ SO, (©), as required. 


Editorial comment. The proposer noted that it is classical that for A € GL,(C), one has 
A = QS as above. A proof can be found in F. R. Gantmacher, The Theory of Matrices, 
Chelsea, 1960, chap. XI, §2. He also noted that this theory was extended to singular 
matrices in I. Kaplansky, “Algebraic polar decomposition”, SIAM J. Matrix Anal. Appl. 11 
(1990), 213-217, where it is proved that a square matrix A over any algebraically closed 
field of characteristic different from 2 can be written as QS if and only if (AA')* and 
(A? A)‘ have the same rank for all k. 

Robin J. Chapman did not find an element of GL, (C), but showed that if the singular 
matrix B = D(T — I!) is written in the form QS, then det(Q) = —1. His method used 
the equation S* = B'B = (T — I)* = Oto show that the kernel and image of S and the 
kernel and image of T — / are all the same subspace of dimension m, and the inner product 
of any two elements of this subspace is zero. Then the orthogonal matrix D~! Q takes this 
subspace to itself. This forces D~!Q € SO, (C). 


No other solutions were received. 


A Variant of Prince Rupert’s Problem 


10251 [1992, 782]. Proposed by J. G. Mauldon, Amherst College, Amherst, MA. 


Let C denote the unit cube, and let P be the set of all pairs [a, b] with a and b mutually 
perpendicular line segments contained in C. 


(a) Evaluate sup{ min{lal, lb|} : [a,b] € PI. 


(b) Deduce the area of the largest square, and the volume of the largest regular octahedron, 
that fit into C. 


Solution by Robin J. Chapman, University of Exeter, Exeter, U. K.. The answers are: 

(a) sup{min{|a], |b]} : [a, b] € P} = 3/2. 

(b) The largest square fitting into C has area 9/8, and the largest regular octahedron fitting 
into C has volume 9/16. 


Proof. For convenience, we shall instead consider a cube C’ centered at the origin and having 
vertices (+1, +1, +1). Consider the three line segments a, b and c centered at the origin 
and having respectively an endpoint at (1/2, 1, 1), (—1, —1/2, 1) and (—1, 1, —1/2). These 
segments are mutually perpendicular and each has length 3. 

To prove (a) we must show that two perpendicular line segments contained in C’ cannot 
both have lengths exceeding 3. Assuming this for the moment, we prove (b). Any square 
contained in @’ has two perpendicular diagonals of length d, say, and thus has area d7/2. 
It follows that the square with diagonals a and b has area 9/2 and is as large as possible 
among squares in C’. The largest square in © thus has area 9/8. Similarly, any regular 
octahedron contained n @’ has three mutually perpendicular diagonals of length d, say, and 
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thus volume d?/6. The octahedron with diagonals a, b and c thus has volume 27/6 = 9/2, 
and the largest octahedron in € has volume 9/16. 

If x and y are in C@’ then the coordinates of x — y have absolute value at most 2, and so 
+(x — y)/2 € C’. It follows that given a line segment a contained in C’ there is a parallel 
line segment a’ through the origin also contained in C’. Hence we may restrict attention to 
line segments through the origin, and as we can extend such segments to the boundary, we 
may restrict attention to those line segments through the origin whose endpoints lie on the 
surface of the cube. 

Suppose that we fix such a segment x of length exceeding 3. By symmetry, we can 
assume that an endpoint of x is (1, a, b) with O < a < b < 1. Since |x| > 3, it follows that 
a > 1/2. If y is a segment perpendicular to x, we need to consider three possibilities for an 
endpoint P of y: (—1, u, v), (u, —1, v) and (u, v, —1), where |u|, |v| < 1. 

First suppose that P = (—1, u, v). Since y is perpendicular to x, we have au + bu = 1. 
In the (u, v) plane, we are considering points on this line in the square |u|, |v| < 1, and 
we want to maximize u* + v* so as to find the maximum possible length for y. This 
maximum occurs where the line meets the boundary of the square, and this happens at 
the points u = 1, v = (1 —a)/b andu = (1 — b)/a, v = 1. It is easy to check that 
1> (1 -—a)/b > (A — b)/a = 0 when 1/2 < a < b < 1, and this gives 


1- 2 
yaad rate <a(24( .*) ) 


If |y| > 3, this yields (1 —a)/b > 1/2 and thus a < 1 — b/2. We also have a* +b* > 5/4 
since |x| > 3, and this yields 


b\? 5 
(1-3) +b*>a*+b*> re 


There is no solution to this inequality in the range 0 < b < 1. 

Next, suppose P = (u, —1, v). Since y is perpendicular to x, u + bv =a. In the (u, v)- 
plane, this meets the boundary of the square |u|, |v| < 1 at the points uv = 1, v = (a—1)/b 
andu =a = b,v = 1. Inthe range 0 < a < b < 1, we always have (1—a)/b > b—a => 0, 


and thus 
2 2 2 1 a ? 
ly| =41+4u +o" <4/2+ : 


As in the previous case, this leads to a contradiction if |y| > 3. 

Finally, suppose P = (u, v, —1). Then u + av = b and the points on the boundary of 
the square are u = 1, v = (b-—1)/a andu = b—a, v = 1. This time, both points must be 
considered. As before, |y| > 3 requires that either b —a > 1/2 or (1 — a)/b > 1/2, while 
a* + b* > 5/4. Ifb —a > 1/2, we have 


1\? 5 
b-—= b* > = 
( >) + > 4 


and if (1 — b)/a > 1/2, we have 
5 
4(1 —b)*+b* > 7: 


Neither of these quadratic inequalities has solutions for 1/2 < b < 1. 


Editorial comment. The problem of finding the largest square inscribed in a unit cube is 
quite old. In H. Croft, K. Falconer & R. Guy, Unsolved Problems in Geometry, Springer, 
1991, it appears as Problem B4 with some unsolved generalizations. The original question 
is attributed to Prince Rupert (1619-1682), with a solution by Pieter Nieuwland in the 
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Eighteenth Century. The inequality of part (a) allows a proof not overly dependent on 
geometric intuition. 


Raphael M. Robinson generalized part (a) to two perpendicular segment in a unit n-cube. 
The desired supremum d,, is 


d= Jn if n is even 
" /~n — 3/4 ifnis odd. 


Solved also by I. Kastanas, O. P. Lossers (The Netherlands), R. M. Robinson, Anchorage Math Solutions Group, 
and the proposer. 


A Fibonacci Series 


10262 [1992, 873]. Proposed by Dean Clark, University of Rhode Island, Kingston, RI. 


Evaluate 
y" 


ore 
» iF, Fn42 


where (F,,) denotes the sequence of Fibonacci numbers. 


Composite solution and generalization by Anatoly S. Izotov, Mining Institute, Novosibirsk, 
Russia, the University of Wyoming Problem Circle, and the editors. The answer is 2 — V5. 
More generally, suppose (p) is a sequence satisfying Py = @Pn—1 + Pn—2 for n => 2, where 


/q2z+4— 
a> 0. Ifk > land pypi — priipo # 0, we prove that lim at = vette and that 
n n 


00 (-1)" fect (YE EA=2)) 


1 
= PnPntk — PkP1 — Pk+1P0 e Pn 


From the characteristic equation for the recurrence, we have p, = Aa” + BB”, where 


Jaz —a/q2 . 
A, B are constants anda = “VF 14 p= = a-Va't4 Since pep — Pe+1 Po X 0, neither 


; Sa24+4— 
A nor B is zero. Because |B/a| <1, we have lim 4! = 1 = vets 
n—>oo Pn ad 
For fixed k > 0, it follows easily by induction on 7 that 


Pn—1Pn+k — Pn+k—-1Pn = (—1)" (Pepi — Pe+1P0) 
for n > 1. Thus, 


(-1)" _ > 1 Pn-1Pn+k — Pn+k-1Pn 
na] PnPnt+k = jay PkP1 — Pk+1P0 Pn Pn+k 
_ 1 3 (7 _ Pett) 
PkP1 — Pk+1P0 (= \ Pn Pn+k 


N-+k 
1 > Pn-1_ 3 Pn-1 
PkP1 — Pk+1P0 n=1 Pn n=N+1 Pn 
Taking the limit as n approaches infinity completes the proof. 


When k = 2 and (p) is the Fibonacci sequence with F; = F2 = 1, the value of the series 
reduces to 2 — /5. 


Editorial comment. The argument here follows the same steps that most solvers used 
directly to sum the series posed in the problem statement. Seven solvers mentioned the 
generalization for arbitrary k when (p) is the Fibonacci series, found in Brother Alfred 
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Brousseau, “Summation of infinite Fibonacci series”, Fibonacci Quarterly, 7(1969), 143- 
168. Seung-Jin Bang cited Br. J. M. Mahon and Alwyn F. Horadam, “Infinite series 
summation involving reciprocals of Pell polynomials”, Fibonacci Numbers and Their Appli- 
cations, (A.N. Philippou, G.E. Bergum, and A.F. Horadam, eds.), D.Reidel, 1986, p.168, 
which studied series determined by other recurrences. 


Solved by 80 readers and the proposer. Two incorrect solutions were received. 


Largest Product of Distances to Vertices 


10282 [1993, 184]. Proposed by Paul Erdés, Hungarian Academy of Sciences, Budapest, 
Hungary. 


Let A, B, C be the vertices of a triangle inscribed in a unit circle, and let P be a point in 
the interior of the triangle ABC. Show that 


32 
PA|-|PB|-|PC| < =. 
JPA] |PBI-|PC| < 5 


Solution I by B. J. Venkatachala, Indian Institute of Science, Bangalore, India. Suppose P 
is a point on the boundary of the triangle ABC, say on the side BC. Let x be the distance of 
P from the center of the unit circle that circumscribes the triangle ABC. It is well known 
that, for all chords X Y through the point P, the product | PX|-|PY| is constant. Therefore, 
|PB|-|PC| =(Q—x)(+x) and|PA| <1+-.x. Thus 


|PA|-|PB|-|PC| <Q—x)1 +x)’. 


The expression on the right assumes its maximum value 32/27 in the interval [0, 1] at 
x = 1/3. Finally, by the maximum modulus principle, |PA|-|PB|-|PC| < 32/27 for any 
point P in the interior of the triangle ABC. 


Solution II by Harry D. Ruderman, Bronx, NY. Let T be the product of the distances from 
P to the vertices A, B, C. Slide A along the circumference to increase | P Al, if possible. 
Either A can reach a point where | P A| attains a local maximum, in which case P, O, A are 
collinear, or it reaches a point where further movement will lead to a triangle not containing 
P in its interior, in which case P will be on one of the sides AB or AC. Repeating this 
operation for the other vertices will lead to a triangle in which P lies on a side of ABC, say 
BC and P, O, A are collinear. If O is not between A and P, replace A by the opposite end 
of the diameter through P, increasing T. Now, rotate BC about P until it is perpendicular 
to AOP. This does not change T. With x = |OP|, the Pythagorean theorem gives 
|PB| =|PC| =V1—<x*,andT =(14+x)1—- x*) which is maximal when x = 1/3 and 
T = (4/3)(8/9) = 32/27. 


Editorial comment. The solutions above are the most efficient of the two principle types 
received. Restriction to the case in which P is on the boundary avoids the need to express 
the product of the distances from a general point. Restriction to symmetric configurations, 
as in solution II, was a common approach. Note that configurations attaining the maximal 
value are easily characterized. 

A generalization to an n-gon inscribed in a circle was obtained by: Roy Barbara, Kevin 
Brown, Hans Georg Killinbergtrg & Ivar Skau, Murray S. Klamkin, Neela Lakshmanan, 
Arthur J. Rosenthal & Radha G. Nath, Ossama A. Saleh & Terry J. Walters, and Ajaj Tarabay. 
The bound in this case is the maximum value of (1—x)(1+x)"~! which is (2/n)"(n—1)"7}. 
A generalization to higher dimensions was given by Faruk F. Abi-Khuzam & Ajaj Tarabay, 
and a generalization to weighted products for polygons was given by Alladi Ramakrishnan. 
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This problem has appeared as problem 1895 in Crux Mathematicorum (December 1993), 
and it has been reported that the problem also appears in Bull. Math. (Wuhan) 1990, no.3 
p. 17 with solution in 1991, no.10, p.42. 

The solution by Harry D. Ruderman was received on February 27, 1993. This was the 
last solution that we received from him, and he died a little over a year later. He was an 
active contributor to the Problem Section since joining the Association as a student in 1931, 
beginning with a published solution to problem 299 [1914, 297; 1931,172; 1931, 462]. 
Other notable early contributions were solutions (not published in full, but summarized) 
to 3746 [1935, 454; 1937, 400] and 3848 [1937, 667; 1940, 575], which were geometry 
problems proposed by Paul Erdos. 


Solved by 61 readers (including those cited) and the proposer. 


Collaborating editors: David F. Appleyard, Paul T. Bateman, Bruce C. Berndt, Duane 
M. Broline, Barry W. Brunson, Frank S. Cater, Gulbank D. Chakerian, Underwood Dudley, 
Gerald A. Edgar, Michael A. Filaseta, Ira M. Gessel, Richard A. Gibbs, Jerrold R. Griggs, 
Douglas A. Hensley, John R. Isbell, Mourad E. H. Ismail, Murray Klamkin, Daniel J. Kleitman, 
Frederick W. Luttmann, Frank B. Miles, Richard Pfiefer, Stephen L. Portnoy, J. O. Shallit, John 
Henry Steelman, Kenneth B. Stolarsky, David E. Tepper, Douglas B. Tyler, Daniel Ullman, and 
William E. Watkins. 


Raphael M. Robinson 


In the February 1995 issue of the Monthly, the article by Deborah 
Haimo, Experimentation and Conjecture Are Not Enough, contained 
a reference to Abraham Robinson and the Banach Tarski Paradox. 
That paragraph should have read: 


Banach and Tarski succeeded in showing the 
fascinating and nonintuitive result that, say, the earth can be 
decomposed into a finite number of pieces which can be re- 
assembled to form a marble, or even to form two earths 
each of the same size as the original. John von Neumann 
added to these amazing facts the observation that only nine 
pieces are needed for the decomposition of one sphere into 
two, all with the same radii. Raphael M. Robinson went 
further yet in 1947 showing that five pieces will suffice! 


With sadness, we also report the death of Raphael M. 
Robinson on January 27,1995. He was a longtime member of the 
MAA, and a valued friend of the Monthly. We will miss his friendly 
letters and advice. 
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Alan Turing 
and the Central Limit Theorem 


S. L. Zabell 


Although the English mathematician Alan Mathison Turing (1912-1954) is re- 
membered today primarily for his work in mathematical logic (Turing machines 
and the “Entscheidungsproblem’”), machine computation, and artificial intelli- 
gence (the “Turing test”), his name is not usually thought of in connection with 
either probability or statistics. One of the basic tools in both of these subjects is the 
use of the normal or Gaussian distribution as an approximation, one basic result 
being the Lindeberg-Feller central limit theorem taught in first-year graduate 
courses in mathematical probability. No one associates Turing with the central 
limit theorem, but in 1934 Turing, while still an undergraduate, rediscovered a 
version of Lindeberg’s 1922 theorem and much of the Feller-Lévy converse to it 
(then unpublished). This paper discusses Turing’s connection with the central limit 
theorem and its surprising aftermath: his use of statistical methods during World 
War II to break key German military codes. 


1. INTRODUCTION. Turing went up to Cambridge as an undergraduate in the 
Fall Term of 1931, having gained a scholarship to King’s College. (Ironically, 
King’s was his second choice; he had failed to gain a scholarship to Trinity.) Two 
years later, during the course of his studies, Turing attended a series of lectures on 
the Methodology of Science, given in the autumn of 1933 by the distinguished 
astrophysicist Sir Arthur Stanley Eddington. One topic Eddington discussed was 
the tendency of experimental measurements subject to errors of observation to 
often have an approximately normal or Gaussian distribution. But Eddington’s 
heuristic sketch left Turing dissatisfied; and Turing set out to derive a rigorous 
mathematical proof of what is today termed the central limit theorem for indepen- 
dent (but not necessarily identically distributed) random variables. 

Turing succeeded in his objective within the short span of several months (no 
later than the end of February 1934). Only then did he find out that the problem 
had already been solved, twelve years earlier, in 1922, by the Finnish mathemati- 
cian Jarl Waldemar Lindeberg (1876-1932). Despite this, Turing was encouraged 
to submit his work, suitably amended, as a Fellowship Dissertation. (Turing was 
still an undergraduate at the time; students seeking to become a Fellow at a 
Cambridge college had to submit evidence of original work, but did not need to 
have a Ph.D. or its equivalent.) This revision, entitled ““On the Gaussian Error 
Function,” was completed and submitted in November, 1934. On the strength of 
this paper Turing was elected a Fellow of King’s four months later (March 16, 
1935) at the age of 22; his nomination supported by the group theorist Philip Hall 
and the economists John Maynard Keynes and Alfred Cecil Pigou. Later that year 
the paper was awarded the prestigious Smith’s prize by the University (see Hodges, 
1983). 


1995] ALAN TURING AND THE CENTRAL LIMIT THEOREM 483 


Turing never published his paper. Its major result had been anticipated, 
although, as will be seen, it contains other results that were both interesting and 
novel at the time. But in the interim Turing’s mathematical interests had taken a 
very different turn. During the spring of 1935, awaiting the outcome of his 
application for a Fellowship at King’s, Turing attended a course of lectures by the 
topologist M. H. A. Newman on the Foundations of Mathematics. During the 
International Congress of Mathematicians in 1928, David Hilbert had posed three 
questions: is mathematics complete (that is, can every statement in the language of 
number theory be either proved or disproved?), is it consistent, and is it decidable? 
(This last is the Entscheidungsproblem, or the ‘“‘decision problem’’; does there exist 
an algorithm for deciding whether or not a specific mathematical assertion does or 
does not have a proof.) Kurt Gédel had shown in 1931 that the answer to the first 
question is no (the so-called ‘“‘first incompleteness theorem’’); and that if number 
theory is consistent, then a proof of this fact does not exist using the methods of 
the first-order predicate calculus (the “second incompleteness theorem’). Newman 
had proved the Gédel theorems in his course, but he pointed out that the third of 
Hilbert’s questions, the Entscheidungsproblem, remained open. 

This challenge attracted Turing, and in short order he had arrived at a solution 
(in the negative), using the novel device of Turing machines. The drafting of the 
resulting paper (Turing, 1937), dominated Turing’s life for a year from the Spring 
of 1935 (Hodges, 1983, p. 109); and thus Turing turned from mathematical 
probability, never to return. 

A copy of Turing’s Fellowship Dissertation survives, however, in the archives of 
the King’s College Library; and its existence raises an obvious question. Just how 
far did a mathematician of the calibre of Turing get in this attack on the central 
limit theorem, one year before he began his pioneering research into the founda- 
tions of mathematical logic? The answer to that question is the focus of this paper. 


2. THE CENTRAL LIMIT THEOREM. The earliest version of the central limit 
theorem (CLT) is due to Abraham de Moivre (1667-1754). If X,, X5, X3,... is an 
infinite sequence of 1’s and 0’s recording whether a success (X,, = 1) or failure 
(X,, = 0) has occurred at each stage in a sequence of repeated trials, then the sum 
S, =X, +X,+ ++: +X, gives the total number of successes after n trials. If the 
trials are independent, and the probability of a success at each trial is the same, 
say P[X, = 1] =p, P[X, = 0] = 1 —p, then the probability of seeing exactly k 
successes in n trials has a binomial distribution: 


n! 


P[S, =k] = kun ami? — py". 


If n is large (for example, 10,000), then as de Moivre noted, the direct 
computation of binomial probabilities “is not possible without labor nearly im- 
mense, not to say impossible”; and for this reason he turned to approximate 
methods (see Diaconis and Zabell, 1991): using Stirling’s approximation (including 
correction terms) to estimate the individual terms in the binomial distribution and 
then summing, de Moivre discovered the remarkable fact that 


S, — np 
lim Pla < 


1 6b 
"| = 1,2 
no>© mp(1 — p) —p) < ID a i exp| 2x dx , 
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or ®(b) — P(a), where ®(x) is the cumulative distribution function of the stan- 
dard normal (or Gaussian) distribution: 


During the 19th and 20th centuries this result was extended far beyond the 
simple coin-tossing setup considered by de Moivre, important contributions being 
made by Laplace, Poisson, Chebyshev, Markov, Liapunov, von Mises, Lindeberg, 
Lévy, Bernstein, and Feller; see Adams (1974), Maistrov (1974), Le Cam (1986), 
and Stigler (1986) for further historical information. Such investigations revealed 
that if X,, X,, X3,... is any sequence of independent random variables having 
the same distribution, then the sum S, satisfies the CLT provided suitable 
centering and scaling constants are used: the centering constant np in the binomial 
case is replaced by the sum of the expectations E[X,,|; the scaling constant 
ynp(1 — p) is replaced by the square root of the sum of the variances Var| X,,] 
(provided these are finite). 

Indeed, it is not even necessary for the random variables X, contributing to the 
sum S, to have the same distribution, provided that no one term dominates the 
sum. Of course this has to be made precise. The best result is due to Lindeberg. 
Suppose E[X,] = 0,0 < Var[X,] < ~, s* =: Var[S,,], and 


X,\° 1X, 
— | 5; — 2E]. 
Ss, Ss, 


(The notation E[.X;Y > ¢] means the expectation of X is restricted to outcomes 
w such that Y(w) > e.) The Lindeberg condition is the requirement that 


A(e) 20, We>O0; (2.1) 


A,(é) = » E 
k=1 


and the Lindeberg central limit theorem (Lindeberg, 1922) states that if the 
sequence of random variables X,, X,,... satisfies the Lindeberg condition (2.1), 
then for all a < D, 


Si, 
a<— <b 
Ss 


n 


lim P 


nwW@m 


= 0(b) — ®(a). (2.2) 


Despite its technical appearance, the Lindeberg condition turns out to be a 
natural sufficient condition for the CLT. There are two reasons for this. First, the 
Lindeberg condition ‘has a simple consequence: if of =: Var[X,], then 


2 
2 Tk 
p, = max “4 — 0. (2.3) 
k<n \ S, 
Thus, if the sequence X,, X,, X3,... satisfies the Lindeberg condition, the vari- 


ance of an individual term X, in the sum S, is asymptotically negligible. Second, 
for such sequences the Lindeberg condition is necessary as well as sufficient for 
the CLT to hold, a beautiful fact discovered (independently) by William Feller and 
Paul Lévy in 1935. In short: (2.1) — (2.2) + (2.3). 

If, in contrast, the Feller-Lévy condition (2.3) fails, then it turns out that 
convergence to the normal distribution can occur in a fashion markedly different 
from that of the CLT. If (2.3) does not hold, then there exists a number p > 0, and 
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two sequences of positive integers {m,} and {n,}, {n,} is strictly increasing such 
that 


l1<m,<n, forall k and Var 


X inj Ong 
=—* > p’*>0. (2.4) 


Shy 


Feller (1937) showed that if normal convergence occurs (that is, condition (2.2) 
holds), but condition (2.4) also obtains, then 


Mz 


= N(0,1). 


That is, there exists a subsequence X,,, whose contributions to the sums S,, are 
nonnegligible (relative to s,,) and which, properly scaled, converges to the standard 
normal distribution. (The symbol “= ” denotes convergence in distribution; 
N(y, a7) the normal distribution having expectation p, variance a7.) 


Note. For the purposes of brevity, this summary of the contributions of Feller and 
Lévy simplifies a much more complex story; see Le Cam (1986) for a more detailed 
account. (Or better, consult the original papers themselves!) 


3. TURING’S FELLOWSHIP DISSERTATION. Turing’s fellowship dissertation 
was written twelve years after Lindeberg’s work had appeared, and shortly before 
the work of Feller and Lévy. There are several aspects of the paper that demon- 
strate Turing’s insight into the basic problems surrounding the CLT. One of these 
is his decision, contrary to a then common textbook approach (see, e.g., Burnside, 
1928, pp. 87-90), but crucial if the best result is to be obtained (and the approach 
also adopted by Lindeberg), to work at the level of distribution functions (i.e., the 
function F(t) =: PLX < t]) rather than densities (the derivatives of the distribu- 
tion functions). In Appendix B Turing notes: 


I have attempted to obtain some results [using densities]... but without success. The reason is 
clear. In order that the shape frequency functions u,(x) of f,(*) should tend to the shape 
frequency function @(x) of the Gaussian error much heavier restrictions on the functions g,(x) 
are required than is needed if we only require that U, — ®. It became clear to me... that it 
would be better to work in terms of distribution function throughout. 


This was an important insight. Although versions of the central limit theorem do 
exist for densities, these ordinarily require stronger assumptions than just the 
Lindeberg condition (2.1); see, e.g., Feller (1971), pp. 516-517, Petrov (1975), 
Chapter 7. 

Let us now turn to the body of Turing’s paper, and consider it, section by 
section. 


3.1. Basic Structure of the Paper. The first seven sections of the paper (pp. 1-6) 
summarize notation and the basic properties of distribution functions. Section 1 
summarizes the problem; Section 2 defines the distribution function F (abbrevia- 
ted DF) of an “error” ¢; Section 3 summarizes the basic properties of the 
expectation and mean square deviation (MSD) of a sum of independent errors; 
rigorous proofs in terms of the distribution function are given in an appendix at 
the end of the paper (Appendix C). Section 4 discusses the distribution function of 
a sum of independent errors, the sum distribution function (SDF), in terms of the 
distribution functions of each term in the sum, and derives the formula for F ® G, 
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the convolution of two distribution functions. Section 5 then introduces the 
concept of the shape function (SF); the standardization of a distribution function F 
to have zero expectation and unit MSD; thus, if F has expectation » and MSD co” 
(o > 0), then the shape function of F is U(x) =: F(o(x — w)). (Turing uses the 
symbols “a” and “k*” to denote w and o7; several other minor changes in 
notation of this sort are made below.) 

In Section 6 Turing then states the basic problem to be considered: given a 
sequence of errors ¢,, having distribution functions G,, shape functions V,, means 
j,, Mean square deviations o, sum distribution functions F,,, and shape functions 
U, for each F,, under what conditions do the shape functions U,(x) converge 
uniformly to ®(x), the “SF of the Gaussian Error’? Turing then assumes for 
simplicity that w, = 0 and a? < ~. In Section 7 (“Fundamental Property of the 
Gaussian Error’), he notes the only properties of ® that are used in deriving 
sufficient conditions for normal convergence are that it is an SF, and the “self- 
reproductive property” of ©®: that is, if X, ~ N(O,o7) and X, ~ N(0,a;) are 
independent, then X, +X, ~N(0,07 +07). (The notation “X ~ N(p, 07)” 
means that the random variable X has the distribution N(y, o7).) 


3.2. The Quasi-Necessary Conditions. It is at this point that Turing comes to the 
heart of the matter. In Section 8 (“The Quasi-Necessary Conditions’) Turing 
notes 


The conditions we shall impose fall into two groups. Those of one group (the quasi-necessary 
conditions) involve the MSDs only. They are not actually necessary, but if they are not fulfilled 
U,, can only tend to ® by a kind of accident. 


The two conditions that Turing refers to as the “‘quasi-necessary”’ conditions are: 
— 0. (3.1) 


It is easy to see that Turing’s condition (3.1) is equivalent to condition (2.3). 
(That (2.3) = @.1) is immediate. To see (3.1) = (2.3): given ¢ > 0, choose M > 1 
so that o,°/s° < « for n =>M, and N=>M so that of /s, < « for1 <k <M; if 
n> WN, then of/s? < « forl1 <k <n.) 

In his Theorems 4 and 5, Turing explores the consequences of the failure of 
either part of condition (3.1). Turing’s proof of Theorem 4 requires his 


Theorem 3. If X and Y are independent, and both X and X + Y are Gaussian, then 
Y is Gaussian. 


This is a special case of a celebrated theorem proven shortly thereafter by 
Harald Cramér (1936); if X and Y are independent, and X + Y is Gaussian, then 
both X and Y must be Gaussian. Lévy had earlier conjectured Cramér’s theorem 
to be true (in 1928 and again in 1935), but had been unable to prove it. Cramér’s 
proof of this result in 1936 in turn enabled Lévy to arrive at necessary and 
sufficient conditions for the CLT of a very general type (using centering and 
scaling constants other than the mean and standard deviation), and this in turn led 
Lévy to write his famous monograph, Théorie de l’ Addition des Variables Aléatoires 
(Lévy, 1937); see Le Cam (1986, pp. 80-81, 90). 

Cramér’s theorem is a hard fact; his original proof appealed to Hadamard’s 
theorem in the theory of entire functions. The special case of the theorem needed 
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by Turing is much simpler; it is an immediate consequence of the characterization 
theorem for characteristic functions. To see this, let 6,(t) =: Elexp(itX )] denote 
the characteristic function of a random variable X; and suppose that X and Y are 
independent, X ~ N(0,a7), and X + Y ~ N(0,a? + 77). Then 


a7 +7? ao? 
ex - 7] = bxiy(t) = $,(t) by(t) = exp - “1 Jov(0). 
hence $,(t) = exp(—(r7/2)t*); thus Y ~ N(0, 77) because the characteristic func- 
tion of a random variable uniquely determines the distribution of that variable. 
Turing’s proof, which uses distribution functions, is not much longer. 

It is an immediate consequence of Cramér’s theorem that if S, /s, = N(O1), 
but lim, ,.. 57 < , then all the summands X ; must in fact be Gaussian. But 
Turing did not have this fact at his disposal, only his much weaker Theorem 3. His 
Theorem 4 (phrased in the language of random variables) thus makes the much 
more limited claim that if (a) {a7 < », (b) S, converges to a Gaussian distribu- 
tion, and (c) X, is a random variable at once independent of the original sequence 
X,,X>,... and having a distribution other than Gaussian, then the sequence 
S* = X, + S, cannot converge to the Gaussian distribution. In other words: if 
La, <, then “the convergence...to the Gaussian is so delicate that a single 
extra term in the sequence ...upsets it” (p. 17). 

Turing’s Theorem 5 in turn explores the consequences of the failure of (3.1) in 
the case that Yo, = ~, but p2 =: 0,7/s? does not tend to zero as n > ». The 
statement of the theorem is somewhat technical in nature, but Turing’s later 
summary of it captures the essential phenomenon involved: 


If F, [the distribution function of S,] tends to Gaussian.and o,7/s7 does not tend to zero [but 
So,7 = ©] we can find a subsequence of G,, [the distribution function of X,,] tending to 
Gaussian. 


Thus Turing had by some two years anticipated Feller’s discovery of the subse- 
quence phenomenon. (In Turing’s typescript, symbols such as “F,,” are entered by 
hand; in the above quotation the space for “F,”’ has by accident been left blank, 
but the paragraph immediately preceding this one in the typescript makes it clear 
that “F,” is intended.) 


3.3 The Sufficient Conditions. Turing states in his preface that he had been 
“informed that an almost identical proof had been given by Lindeberg.” This 
comment refers to the method of proof Turing uses, not the result obtained. 
Turing’s method is- to smooth the distribution functions F(x) of the sum by 
forming the convolution F,, * ®(x /p), expand the result in a Taylor series to third 
order, and then let the variance p* of the convolution term tend to zero. This is 
similar to the method employed by Lindeberg. (There is an important difference, 
however: Turing does not use Lindeberg’s “swapping” argument. For an attractive 
modern presentation of the Lindeberg method, see Breiman, 1968, pp. 167-170; 
for discussion of the method, Pollard’s comments in Le Cam, 1986, pp. 94-95.) 
Turing does not, however, succeed in arriving at the Lindeberg condition (2.1) 
as a sufficient condition for convergence to the normal distribution; the most 
general sufficient condition he gives (on p. 27) is complex in appearance (although 
it necessarily implies the Lindeberg condition). Turing concedes that his ‘form of 
the sufficiency conditions is too clumsy for direct application,” but notes that it can 
be used to “‘derive various criteria from it, of different degrees of directness and of 
comprehensiveness” (p. 28). One of these holds if the summands X, all have the 
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same shape (that is, the shape functions V,(x) =: P[X, /o, <x] coincide); and 
thus includes the special case of identically distributed summands having a second 
moment. (This was no small feat, since even this special case of the more general 
Lindeberg result had eluded proof until the publication of Lindeberg’s paper.) 
One formulation of this criterion, equivalent to the one actually stated by 
Turing, is: there exists a function J: R’— R™ such that lim, _,,./(t) = 0, and 


2: ) X 
OK 


OK 


E >t} <J(t) foralk >1,¢2>0. (3.2) 


In turn one simple sufficient condition for this given by Turing (pp. 30-31) is 
that there exists a function ¢ such that $(x) > 0 for all x, lim, , ,.d(x) = ©, and 


XxX, ° XxX, 
ce) ls 

OK OK 
(Note that unfortunately one important special case not covered by either of these 
conditions is that the X, are uniformly bounded: |X,|< C for some C and all 
k > 1.) 

In assessing this portion of Turing’s paper, it is important to keep two points in 
mind. First, Turing states in his preface that “since reading Lindeberg’s paper I 
- have for obvious reasons made no alterations to that part of the paper which is 
similar to his.”” The manuscript is thus necessarily incomplete; it presumably would 
have been further polished and refined had Turing continued to work on it; the 
technical sufficient conditions given represent how far Turing had gotten on the 
problem prior to seeing Lindeberg’s work. Second, in 1934 the Lindeberg condi- 
tion was only known to be sufficient, not necessary; thus even in discussing his 
results in other sections of the paper (where he felt free to refer to the Lindeberg 
result), it may not have seemed important to Turing to contrast his own particular 
technical sufficient conditions with those of Lindeberg; the similarity in method 
must have seemed far more important. 


sup E 
k 


| < 0, (3.3) 


3.4. One Counterexample. In Section 14 Turing concludes by giving a simple 


example of a sequence X,, X,,... that satisfies the quasi-necessary conditions 
(3.1), but not the CLT. For n > 1, let 
P| X P| X, =0 1 
= +n] = 3: =0])=1--. 
[X, = tn] = 555 PLX, = 0] =1- 


Then E[X,] =0, VarLX,] = ELX7] = 1, s? = Var[S,] =n >», and p2 = 
1/n — 0; thus (3.1) is satisfied. Turing then shows that if S, /s, converges, the 
limit distribution must have a discontinuity at zero, and therefore cannot be 
Gaussian. 

It is interesting that Turing should happen to choose this particular example; 
although he does not note it, the sequence {S, /s,: n = 1} has the property that 
Var|S, /s,] = 1, but lim, _,,, S,(@)/s,, = 0 for almost all sample paths w. This is 
an easy consequence of the first Borel-Cantelli lemma: because 


CO 


a | 
¥ PLX, 0) = EG =k) => <a, 
n=1 


n=1 


it follows that P[X, #0 infinitely often] = 0; thus P[sup, |S,|< «] = 1 and 
Pilim,, 05, /S, = 0] = 1. 
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The existence of such sequences has an interesting consequence for the CLT. 
Let {Y,: n = 1} be a sequence of independent random variables, jointly indepen- 
dent of the sequence {X,: n > 1} and such that P[Y, = +1] = 5. Let T, = Y, + 
Y, +++: +Y,; then a trite calculation shows that S, + 7, satisfies the Feller 
condition (2.3), but not the Lindeberg condition (2.1). Let t? =: Var[T,]; then 
T, /t, = N(O,1) and Var[S, + T,] = s* + t7, hence 


S, + TZ, Si, S,, t, T, 
eee ——_ +- ———— ——_ 
Var[S, + T,] V5, +t2 \Sn Vs, tty \bn 


) (alle) 
= N(0, >). 


Thus the sequence S, + T, does converge to a Gaussian distribution! This does 
not, however, contradict the Feller converse to the Lindeberg CLT; that result 
states that S, +7, rescaled to have unit variance, cannot converge to the 
standard Gaussian N(0, 1). 


4. DISCUSSION. Turing’s Fellowship Dissertation tells us something about 
Turing, something about the state of mathematical probability at Cambridge in the 
1930s, and something about the general state of mathematical probability during 
that decade. 

I. J. Good (1980, p. 34) has remarked that when Turing ‘“‘attacked a problem he 
started from first principles, and he was hardly influenced by received opinion. 
This attitude gave depth and originality to his thinking, and also it helped him to 
choose important problems.” This observation is nicely illustrated by Turing’s work 
on the CLT. His dissertation is, viewed in context, a very impressive piece of work. 
Coming to the subject as an undergraduate, his knowledge of mathematical 
probability was apparently limited to some of the older textbooks such as “‘Czuber, 
Morgan Crofton, and others” (Preface, p. ii). Despite this, Turing immediately 
realized the importance of working at the level of distribution functions rather 
than densities; developed a method of attack similar to Lindeberg’s; obtained 
useful sufficient conditions for convergence to the normal distribution; identified 
the conditions necessary for true central limit behavior to occur; understood the 
relevance of a Cramér-type factorization theorem in the derivation of such neces- 
sary conditions; and discovered the Feller subsequence phenomenon. If one 
realizes that the defects of the paper, such as they are, must largely reflect the fact 
that Turing had ceased to work on the main body of it after being apprised of 
Lindeberg’s work, it is clear that Turing had penetrated almost immediately to the 
heart of a problem whose solution had long eluded many mathematicians far 
better versed in the subject than he. (It is interesting to note that Lindeberg was 
also a relative outsider to probability theory, and only began to work in the field a 
few years before 1922.) 

The episode also illustrates the surprisingly backward state of mathematical 
probability in Cambridge at the time. Turing wrote to his mother in April, 1934: “I 
am sending some research I did last year to Czuber in Vienna [the author of 
several excellent German textbooks on mathematical probability], not having found 
anyone in Cambridge who is interested in it. I am afraid however that he may be 
dead, as he was writing books in 1891” (Hodges, 1983, p. 88). (Czuber had in fact 
died nearly a decade before, in 1925.) 
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This disinterest is particularly surprising in the case of G. H. Hardy, who was 
responsible for a number of important results in probabilistic number theory. But 
anyone who has studied the Hardy-Ramanujan proof of the distribution of prime 
divisors of an integer (1917), and compared it to Turdn’s (see Kac, 1959, pp. 71-74) 
will realize at once that the even most rudimentary ideas of modern probability 
must have been foreign to Hardy; see also Elliott (1979, pp. 1-5), Elliott (1980, 
pp. 16-20). Indeed, Paul Erdds believes that “had Hardy known the even least 
little bit of probability, with his amazing talent he would certainly have been able 
to prove the law of the interated logarithm” (Diaconis, 1993). Perhaps this 
reflected in part the limited English literature on the subject. In 1927, when 
Harald Cramér visited England and mentioned to Hardy (his friend and former 
teacher) that he had become interested in probability theory, Hardy replied that 
“there was no mathematically satisfactory book in English on this subject, and 
encouraged me to write one” (Cramér, 1976, p. 516). 

Finally, Turing’s thesis illustrates the transitional nature of work in mathemati- 
cal probability during the decade of the 1930s, before the impact of Kolmogorov’s 
pioneering book Grundbegriffe der Wahrscheinlichkeitsrechnung (Kolmogorov, 1933) 
had been felt. In his paper Turing had thought it necessary to state and prove 
some of the most basic properties of distribution functions and their convolutions 
(in Sections 3 and 4, and Appendix C of the dissertation). His comment that his 
Appendix C “is only given for the sake of logical completeness and it is of little 
consequence whether it is original or not” (Preface, p. i), illustrates that such 
results, although “‘known,” did not enjoy general currency at the time. (It is all too 
easy to overlook today the important milestone in the literature of the subject 
marked by the publication in 1946 of Harald Cramér’s important textbook 
Mathematical Methods of Statistics.) 

It is also interesting to note Turing’s approach to the problem in terms of 
convolutions of distribution functions rather than sums of independent random 
variables. Feller had similarly avoided the use of the language of random variables 
in his 1935 paper, formulating the problem instead in terms of convolutions. The 
reason, as Le Cam (1986, p. 87) notes, was that “Feller did not think that such 
concepts [as random variable] belonged in a mathematical framework. This was a 
common attitude in the mathematical community.” 

Current mathematical attitudes towards probability have changed so markedly 
from the distrust and scepticism of earlier times that today the sheer magnitude of 
the shift is often unappreciated. Joseph Doob, whose own work dates back to this 
period, notes that ‘even as late as the 1930s it was not quite obvious to some 
probabilists, and it was certainly a matter of doubt to most nonprobabilists, that 
probability could be treated as a rigorous mathematical discipline. In fact it is clear 
from their publications that many probabilists were uneasy in their research until 
their problems were rephrased in what was then nonprobabilistic language” 
(Le Cam, 1986, pp. 93-94). 


5. EPILOGUE: BLETCHLEY PARK. After his fellowship dissertation Turing 
“always looked out for any statistical aspects of [a] problem under consideration” 
(Britton, 1992, p. ix). This trait of Turing is particularly striking in the case of his 
cryptanalytic work during the second world war. 

Turing left England for Princeton in 1936, to work with the logician Alonzo 
Church; he returned in 1938, after his Fellowship at King’s College had been 
renewed. Recruited almost immediately by GC and CS (the Government Code and 
Cipher School), on September 4th, 1939 (one day after the outbreak of war) Turing 
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reported to Bletchley Park, the British cryptanalytic unit charged with breaking 
German codes, soon rising to a position of considerable importance. (Turing’s 
work at Bletchley was the subject of a 1987 London play, “Breaking the Code,” 
written by Hugh Whitemore and starring Derek Jacobi, of ‘‘I, Claudius” fame.) 

The staff at Bletchley Park included many gifted people, distinguished in a 
number of different fields; among these were the mathematicians M. H. A. 
Newman, J. H. C. Whitehead, Philip Hall, Peter Hilton, Shaun Wylie, David Rees, 
and Gordon Welchman; the international chessmasters C. H. O’D. Alexander, 
P. S. Milner-Barry, and Harry Golombek; and others such as Donald Mitchie 
(today an important figure in artificial intelligence), Roy Jenkins (the later Chan- 
cellor of the Exchequer), and Peter Benenson (the founder of Amnesty Interna- 
tional). Turing’s chief statistical assistant in the later half of 1942 was another 
mathematician, I. J. Good, fresh from studies under Hardy and Besicovitch at 
Cambridge. (Good arrived at Bletchley on May 27, 1942, the day the Bismarck was 
sunk.) In recent years Good has written several papers (Good 1979, 1980, 1992, 
1993a) discussing Turing’s ad hoc development of Bayesian statistical methods at 
Bletchley to assist in the decrypting of German messages. (More general accounts 
of the work at Bletchley include Lewin, 1978, Welchman, 1982, and Hinsley and 
Stripp, 1993; see also the bibliography in Good, 1992.) 

The specific details of Turing’s statistical contributions are too complex to go 
into here. (Indeed, much of this information was until recently still classified and, 
perhaps for this reason, Good’s initial papers on the subject do not even describe 
the specific cryptanalytic techniques developed by Turing; they give instead only a 
general idea of the type of statistical methods used. But in his most recent paper 
on this subject (Good, 1993a), Jack Good does provide a detailed picture of the 
various cryptanalytic techniques that Turing developed at Bletchley Park.) Three 
of Turing’s most important statistical contributions were: (1) his discovery, inde- 
pendently of Wald, of some form of sequential analysis; (2) his anticipation of 
empirical Bayes methods (later further developed in the 1950s by Good and 
independently by Herbert Robbins); and (3) his use of logarithms of the Bayes 
factor (termed by Good the “weight of evidence’’) in the evaluation and execution 
of decryption. (For many references to the concept of weight of evidence, see, for 
example, Good, 1993b and the two indices of Good, 1983.) The units for the 
logarithms, base 10, were termed bans and decibans: 


The reason for the name ban was that tens of thousand of sheets of paper were printed in the 
town of Banbury on which weights of evidence were entered in decibans for carrying out an 
important process called Banburismus... . [Good, 1979, p. 394] 


“Tens of thousands of sheets of paper... .” This sentence makes it clear that 
Turing’s contributions in this area were not mere idle academic speculation, but an 
integral part of the process of decryption employed at Bletchley. 

One episode is particularly revealing as to the importance with which the 
Prime Minister, Winston Churchill, viewed the cryptanalytic work at Bletchley. 
On October 21, 1941, frustrated by bureaucratic inertia, Turing, Welchman, 
Alexander, and Milner-Barry wrote a letter directly to Churchill (headed ‘Secret 
and Confidential; Prime Minister only’) complaining that inadequate personnel 
had been assigned to them; immediately upon its receipt Churchill sent a memo to 
his principal staff officer directing him to “make sure they have all they want on 
extreme priority and report to me that this had been done” (Hodges, 1983, pp. 
219-2721). 
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Much of I. J. Good’s own work in statistics during the decades immediately 
after the end of the war was a natural outgrowth of his cryptanalytic work during 
it; this includes both his 1950 book Probability and the Weighing of Evidence; and 
his papers on the sampling of species (e.g., Good, 1953) and the estimation of 
probabilities in large sparse contingency tables (much of it summarized in Good, 
1965). Some of this work was stimulated either directly (see, e.g., Good, 1973, p. 
936) or indirectly (the influence being somewhat remote, however, in the case of 
contingency tables) by Turing’s ideas: 


Turing did not publish these war-time statistical ideas because, after the war, he was too busy 
working on the ground floor of computer science and artificial intelligence. I was impressed by 
the importance of his statistical ideas, for other applications, and developed and published some 
of them in various places. [Good, 1992, p. 211] 
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Niels Hendrik Abel and Equations 
of the Fifth Degree 


Michael I. Rosen 


This paper is dedicated to the memory of my close friend and colleague Kenneth 
Ireland. 


In most textbooks it is stated that Abel was the first to prove that the general 
equation of the fifth degree cannot be solved in radicals. However, Abel’s proof is 
almost never presented. Instead, the theorem is proved by means of Galois theory. 

Abel published his first proof of this theorem (at his own expense) in 1824 [1, 
Vol 1], and a longer more elaborate version appeared in Crelle in 1826 [1, Vol. 1]. 
E. Galois was thirteen years old in 1824. His spectacular paper on the theory of 
equations was submitted to, and rejected by, the French Academy of Science in 
1830. It wasn’t published until 1846, fourteen years after his death. For details of 
this sad story the reader can consult the very interesting book of Harold Edwards 
[5]. From all this it is clear that Abel’s proof could not have used Galois theory. 
How then did he do it? The purpose of this article is to provide an answer to this 
question which will be easily accessible to a modern reader familiar with the 
elements of the theory of fields. The proof we will give is not identical with that of 
Abel, but is in the spirit of his proof and uses nothing that was unavailable to him. 
Both before and after the proof we try to put things in historical context, and 
indicate how matters developed after 1826. In particular we will discuss the earlier 
work of P. Ruffini and the later work of Galois, as well as a pretty contribution of 
L. Kronecker. 

Of course, other authors have discussed this material. What’s new here is 
mainly the mode of presentation and the arrangement of the proofs. R. Ayoub’s 
article on Ruffini [2] gives an excellent historical and mathematical overview of the 
theorem. J. P. Tignol’s recent book on the theory of equations [7] gives among 
other things a history of the subject from ancient times up to the era of Galois. 
Both these sources discuss Abel’s contributions. Nevertheless, we feel that a 
relatively brief and accessible exposition. of these matters from a somewhat 
different point of view may be of interest to readers who are unfamiliar with this 
fascinating piece of mathematical history. 


SECTION 1. The solution of the quadratic equation x* + ax + b = 0 goes back 
to antiquity. The roots are 


—a + Va’ — 4b 


X15%X2 = 7 


The solution of the cubic equation x° + ax? + bx + c = 0 was not discovered 
until the 16th century. Around 1515, S. del Ferro found a solution, but did not 
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publish it. The solution was rediscovered in 1535 by N. Fontana, nicknamed 
Tartaglia, who also kept it a secret until it was coaxed out of him by G. Cardano 
and published in Cardano’s famous work “Ars Magna’’. The first step is to reduce 
the cubic x°? + ax* + bx + c to the form x? + px + g by means of the substitution 
x +x — 4. The solutions of x? + px + q = 0 are given by 


3 3 
q p\? q\? q p\? q\? 
X1,%X%7,%X%3 = 57 3 + 5 + 5 3 + (5 . 


This formula must be supplemented by a rule telling how to choose the cube 
roots properly. 

Soon after Tartaglia found his solution to the cubic equation, a solution was 
found to the quartic equation by L. Ferrari, the brilliant assistant to Cardano. We 
shall not write it down, but simply note that it involves nothing more than the 
rational operations of addition, subtraction, multiplication, and division, as well as 
extractions of square and cube roots. At this point it seemed reasonable to believe 
that the quintic equation could be solved by similar means, i.e. starting with the 
coefficients of the equations one should employ the rational operations together 
with the extraction of square, cube, and possibly fifth roots. However, in spite of 
much effort on the part of some of the best mathematicians in the world, no 
solution was found for over two and one half centuries. 

The first mathematician to state definitively that no solution existed was 
Ruffini. In 1799, Ruffini published a two volume treatise entitled, “Teoria Gen- 
erale delle Equazioni’” in which he claims to show that the general equation of the 
fifth degree cannot be solved in radicals. For various reasons his results were 
received with skepticism, even though as eminant a mathematician as A. Cauchy 
found his arguments convincing. It turns out that although Ruffini did prove quite 
a lot, and did make important contributions, there was a significant gap in his 
proof. For all this the reader should consult [2]. We will discuss the gap in Ruffini’s 
proof in Section 4. 

While Ruffini’s proof did not find universal acceptance, his work did help turn 
the direction of research away from the problem of finding a solution to an 
equation of the fifth degree to the problem of showing that in general no such 
solution exists. It is in this atmosphere that the young Abel entered the picture. 


SECTION 2. In this section we set up notation, give a precise statement of the 
problem, review the solution via Galois theory, and begin our discussion of how 
Abel was able to find a proof which doesn’t use Galois theory; a necessity for him 
since Galois theory had not yet been invented when he discovered his proof. 

Throughout this paper we will use freely the notion of field, field extension, etc. 
Abel and his predecessors expressed themselves in different, but equivalent 
language. All fields will be assumed to be of characteristic zero. 

Let k be a field and f(x) € k[x] a monic polynomial. If 


f(x) = (¥ — 6,)(% — 82) +++ (x — 4,) 
in some extension field of k, we call F = k(6,,0,,...,0,) a splitting field of f(x) 


over k. In other words, a splitting field F of f(x) over k is the field obtained from 
k by adjoining all the roots of f(x) = 0 to k. 
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A finite algebraic extension E/k is called a radical tower over k if there is a 
series of intermediate fields 


K=EF,CE,C::: CE, CE, =E 


such that for each 0 <i<m, E,,,= E{'Va; } where p,; is a prime and a, € E;. 

We can now give the precise definition of what it means for an equation to be 
solvable in radicals. Let f(x) € k[x] be a polynomial, and F a splitting field for 
f(x) over k. We say that the equation f(x) = 0 is solvable in radicals if there is a 
radical tower E'/k such that F Cc E. This definition is just a way of saying, in the 
language of fields, that the roots of f(x) = 0 can be obtained from the coefficients 
by the successive use of the rational operations and the extraction of roots. It is not 
clear, a priori, when f(x) = 0 is solvable in radicals, that F'/k is itself a radical 
tower. In fact, this is not true in general as can be seen by considering the 
extension Q(a@)/Q where a = 2cos(277/7). This extension is not a radical exten- 
sion. On the other hand, a is a root of the irreducible cubic x? +x” —2x- 1 
which splits into linear factors in Q(q@). 

One of the principal accomplishments of Galois was to give a beautiful criterion 
for when an equation f(x) = 0 was solvable in radicals. It is no real loss of 
generality to assume that f(x) is irreducible, and we do so. In this circumstance, 
Galois shows how to assign a group G; to f(x). It is a certain transitive subgroup 
of the group of permutations of the roots of f(x). 


Theorem (Galois). f(x) = 0 is solvable in radicals if and only if G, is a solvable 
group. 


We recall that a finite group G is solvable if there is a sequence of subgroups 
(e) =G,CG,CG,C::: CG,=G 


such that for each i, 0 <i <m, G, is normal in G,,, and p,;,, =[G,,,: G,] is 
prime. 

To use Galois’ theorem to show equations of the fifth degree and higher are not 
in general solvable in radicals, one computes the Galois group of the general 
equation of the nth degree and shows it is equal to S_, the full symmetric group on 
n letters. One then shows that S, is not a solvable group when n > 5. This is the 
approach used in all modern texts in algebra. 

Let’s explain the notion of the “general equation of degree n”. Let k be a field 


of characteristic zero, and let s,,5,,...,s, be quantities which are algebraically 
independent over_k. Set K = k(s,, 55,,...,5,) and define 
f(x) =x" —s,x" 14+ 5,x"-2— +--+ +(-1)'s, © K[x] 


to be the general equation of degree n over k. 
Suppose f(x) = (x — x, x — x,)---(x — x,) in some extension field of K. Set 
L = K(x), x3,...,x,). Clearly, L is a splitting field for f(x) over K. It is not hard 


to show that x,,x,,...,x, are algebraically independent over k (for details, see 
[6]). Moreover, the s, are elementary symmetric functions of the x,. 

S$; =X, +x, + °°: +x, 

Sy = XX, t xX Xz 4+ °°: +X,_ Xx, 

Sp = XX, X,. 
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Each permutation of the x, induces an automorphism of L which leaves K 
fixed. Moreover, the only elements of L which are left fixed by all such automor- 
phisms are the elements of K. Although this is stated in modern language, the 
content is all quite old. The last part is easily seen from the theorem that a 
symmetric polynomial is a polynomial in the elementary symmetric functions of the 
variables. This fact goes back, in essence, to Newton (see [5]), and was used freely 
by the predecessors of Abel and Galois. Reverting to modern language, we see 
that L/K is a Galois extension with Galois group isomorphic to S,, or, put 
another way, S, is the Galois group of the general equation of degree n over k. 
So, with this setup a certain amount of Galois theory was available to people like 
Vandermonde, Lagrange, Ruffini, and Abel. What was missing was the notion of 
normal subgroup. This fundamental notion is not visible in the work of these 
earlier authors. Thus they could not even formulate the notion of a solvable group, 
never mind prove Galois’ criterion for when an equation was solvable in radicals. 
The rudiments of group theory will play a big role in our treatment of Abel’s work 
on this problem, but nowhere will the notion of normal subgroup make an 
appearance. 


SECTION 3. We will now devote our attention to properties of the group S,. Very 
little will be needed. The elements of S, will be considered to be permutations of 
the set {1,2,...,”}. For a polynomial in n variables f(x,,x,,...,x,) and an 
- element o € S, we define 
(of )(x,,--.,X,) = f (Xe): Lees X ony): 

This action extends in a natural way to an action on rational functions. Now 

define 
i<j 

For any 0 € S,, 06 = +6. If 7 is a transposition, 76 = —6. Then, A, CS, is 
defined to be the subset of S, consisting of all o © S, such that a6 = 6. Clearly, 
A,, is a subgroup. Note that [S,: A,,] = 2 since for any o € S and any transposition 
7, either o or to is in A,. 


Facts: 


1. S, is generated by transpositions. 
2. A,, is generated by 3-cycles. 
3. A, is generated by m-cycles, where m is any odd number between 3 and n. 


The first two facts are standard. To prove the third, note that an m-cycle is a 
product of m — 1 transpgsitions, so if m is odd an m-cycle is in A,. On the other 
hand, the identity 

(4,743) = (424,4304...4,,)( AA) +--+ 4443424;) 
shows that every 3-cycle is in the group generated by the m-cycles, so Fact 3 
follows from Fact 2. 

These facts are all we will need, but we add one more, due to Cauchy, since 

Abel made use of it in his original proof. 


4. Let S, acton L = K(x,, x5,...,x,) as explained above. Let p be the largest 
prime less than or equal to n. Then, for f © L, the number of distinct values 
taken on by f under the action of S, (i.e. the number of distinct rational 
functions obtained from f by permuting the variables) either exceeds p or is 
at most 2. 
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Proof: Let a € S, be a p-cycle, and (a) be the subgroup generated by o. Define 
H = {r € (ao)|rtf = f}. Since p is a prime, either H = (a7) or H = <e). Thus, 
either f,of,...,0” 'f are all distinct, or of =f. If f takes on fewer than p 
values, we must have of =f for all p-cycles. By Fact 3 this implies that f is fixed 
by A,. Since A, has index 2 in S, the result follows. 

It is, of course, true that this result is really about S, acting on an arbitrary set 
(same proof), but we have given the original formulation. 


SECTION 4. We have now assembled everything we shall need. We use the 
notation of Section 2, except that we now add the assumption, as Abel did, that 
sufficiently many roots of unity are in the ground field k. Readers who are 
bothered by this can take k = C, the complex numbers. Nothing essential is lost by 
this. 


Theorem (Abel). Let f(x) =x" — s,x"~' +--+ +(—1)"s, be the general equation 
of degree n over k. If n = 5 then this equation is not solvable in radicals. 


Recall that f(x) = (x — x,)x —x,)...(4 —x,) in L = K(x, X5,...,x,). S, 
acts on L by permuting the x; and K = k(s,,55,...,5s,) is the fixed field. If 
f(x) = 0 were solvable in radicals, we would have a radical tower E/K such that 
L CE. Abel proceeds in two steps. 


Step 1. If L is contained in a radical tower over K, then L/K is itself a radical 
tower. 


Step 2. If n > 5 then L/K is not a radical tower (in fact, Abel restricts himself to 
the case n = 5). 


When he discovered his proof, Abel was unaware that the proof of Step 2 had 
been achieved years earlier by Ruffini. Ruffini did not give a proof of Step 1. It is 
not clear that he realized it was necessary. That was the gap in his proof! So, the 
proof of Step 1 was Abel’s essential contribution. 

In the next section we will give a proof of Step 2 by a method different from 
that of either Abel or Ruffini, although it is close in spirit to some of Ruffini’s later 
proofs (he gave many). I have adapted it from the classic text of Burnside and 
Panton [3]. The proof is short and elegant and uses nothing that was unavailable to 
either Abel or Ruffini. In Section 6 a proof of Step 1 will be given which is in 
essence that of Abel except for the use of the language of field theory and the 
inclusion of more details than are given in the original paper. At the end of that 
section we will also sketch a portion of Abel’s own proof of Step 1. 


SECTION 5. With the previous notation still in effect, we will show that L/K 
cannot be a radical tower if n > 5. 

We re-emphasize that we are assuming that the base field k contains as many 
roots of unity as needed. 

Suppose 


K=K,CK,CK,C °°: CK, =L 


is a radical tower. Then there is a prime p and an element a € K™*™ such that 
K, = K(Va ). We will show that p = 2 and that a = b?A where b € K* and A is 
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the symmetric function defined at the beginning of Section 3. In other words, K, is 
uniquely determined and is the field K(VA ). 


To prove this, set a = Va and let r& S, be a transposition. Applying 7 to 
a? =a we find r(a)? =a and consequently, (r(a@)/a)? = 1 so that tla) = fa 
where ¢? = 1. Now apply 7 to both sides of the equation t(a@) = fa and one finds 
a = t(la) = fr(a) = £%a. It follows that either r(a) # a for some transposition 7 
and p = 2, or a is fixed by all transpositions. In the latter case, @ is fixed by all S, 
(use Fact 1 of Section 3) and is an element of K, contrary to assumption. The 
proof shows that r(@) = +a for all transpositions 7. It follows that o(a@) = +a 
for all o € S,. Now, every three-cycle is a square; in fact, (abc) = (acb)*. Thus, a 
is fixed by three-cycles and so by all of A, by Fact 2 of Section 3. Since r(a) = —a 
for at least one transposition we now see this must be true for every transposition. 
This property also holds for the polynomial 6. Consequently, a/6 is fixed by all 
transpositions and so also by all elements in S,. Thus, a/6 =b © K and so 
a =a’ = b’5* = b*A which shows K, = K(vA ) as asserted. 

We will show that when n > 5, K, has no radical extension in L. This will 
complete the proof that L/K is not a radical tower when n > 5. 

Suppose c © K* and K,=K (ve ). Set y = Ve. By the first part of the proof, 
A, leaves K, fixed. Let p be a three-cycle, and apply p to both sides of the 
equation y? = c. One deduces that p(y) = ¢y where £7 = 1. Applying p twice to 
the equation p(y) = fy yields y = p*(y) = ¢°y. Thus, either p(y) = y for all 
three-cycles p or p(y) # y for some three-cycle and gq = 3. In the former case, y is 
fixed by A, and is in K,, contrary to assumption. As we point out below, one need 
not invoke Galois theory to establish this point. The conclusion is that g = 3. If 
n > 5, A, is also generated by five-cycles (use Fact 3 of Section 3). Repeating the 
same arguments shows that g = 5. This contradiction establishes the result. 


Remark 1. The fact that K, = K(VA) is the fixed field of A,, can be proven as 
follows. Let y € L be fixed by A, and let 7 be a transposition. It is not hard to see 
that a = y + r(y) and b = y - r(y) are both fixed by S, and so are in K. But, y is 
a root of the equation x* — ax + b = 0 and so must generate a quadratic exten- 
sion of K in L. We have seen that K, = K(VA ) is the unique quadratic extension 
of K in L, and so y € K,. 


Remark 2. The group theory behind the proof is very simple and general. Let G 
be a group generated by elements of order r and of order s. Suppose that 
gcd(r,s) = 1. Then,.G has no abelian quotients. For, if G/N is an abelian 
quotient, it must be generated by elements of order r and elements of order s. 
Since it is an abelian group it is annihilated by r and s, and so also by 1 which is 
the gcd of r and s. Thus, G = N. In the above proof we took G = A, for n => 5, 
and r=3,s=5. 


SECTION 6. We now come to Abel’s proof of Step 1; if L/K is contained in a 
radical tower it is a radical tower. We continue to assume that K = 
k(s,,55,--.,5,) and that L is the splitting field of the generic polynomial 
x" —s5,x"~! + +--+ +(—1)"s,. However, the arguments are quite general and 
apply whenever L./K is the splitting field of a separable polynomial over K, and K 
contains sufficiently many roots of unity. 

All the results we will need are in Abel’s original article [1, pp. 66—87] with 
perhaps different formulations. The first Lemma is now a standard result. 
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Lemma 1. Let F be a field containing a primitive q’th root of unity. If a € F* is not 
a q’th power, the x? — a is irreducible. 


If a is a root of x7 — a = 0 then every y € F(a) can be written in the form 


Y=a,t+aat+::: +a,_,a7~! (1) 


where the a, are in F. 


Lemma 2. Assume that x? — a © F(x] is irreducible and that a is a root. Let y be 
an element of F(a) with y € F. Then there is a B € F(a) such that B4 € F and 


Yo by + B+ b,p* ro +b,_, B47"! 


where by, b5,...,56,_, & F. 


q-l 
Proof: Write y as in equation (1) above. Let 1 < k <q be the smallest integer 
such that a, #0. Set B =a,a*. Clearly, B4 € F. For 1<m <q we can find 
integers r and s such that 0 < s < g and rq + sk =m. Then 


qa” = (a7)"(ak) =c,p° withc, € F. 
The desired expression for y now follows by substitution into equation (1). 


Lemma 3. Let gq be a prime, and € a primitive q’th root of unity. Then, for each 
integer 1, 


0 if qdoes not divide i, 
1+ O64 0% 4-5 4f0- = 4 yp 
q_ if q divides i. 
Proof: Again, this is standard. If q divides 7 the result is clear. If g doesn’t divide i 
one uses the formula for the sum of a geometric series. 


Lemma 4. Consider the extension L/K. Let y € L. Then the irreducible polynomial 
for y over K splits into linear factors in L| x]. 


Proof: Let y,, Y>,---»¥, be the distinct values (conjugates) of y under the action 
of the symmetric group. Then, g(x) = (x —y,x —y,)...(% —y,,) has coeffi- 
cients which are invariant under S, and so are elements of K. The irreducible 
polynomial for y over K must divide g(x) and the result follows. (Of course, it is 
easy to see that the irreducible polynomial for y over K is g(x)). 


We now come to the main lemma which contains the crux of the argument. 
Roughly speaking, it asserts that if a radical extension containing K is intersected 
with L, the resulting pair of fields is again a radical extension. It might be 
worthwhile at this point to remind the reader once more that we are assuming all 
the roots of unity that arise are in the base field. If this assumption is not made, 
the result may well be false. 


Lemma 5. Let E/K be an extension field, q a prime, and a € E an element such 
that x? — a © E|x] is irreducible. Let a be a root of x? —a = 0. Set M = E(a) NL 
and M,=EQL. If M#M, then M/M, is a radical extension. More precisely, 
there isa B © M such that B? © M, and B generates M over M,. 
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Proof: Let y © M, y €M,. By Lemma 2, we can find a B € E(q@) such that 
B? =b €E and 
y= byt B+ oP? +--+ +b, pt 
where the b; € E. Let g(x) € K[x] be the irreducible polynomial for y over K, 
and set 
G(x) = g(do +x+b,x* + °°: +b,_,x7"'). 

G(x) is in E[x] and has B for a root. By Lemma 1, x? — b € E[x] is irreducible. 
Thus x? — b divides G(x). It follows that G(¢‘B) = 0 where ¢ is a primitive g’th 
root of unity, and 7 is any integer, and so the numbers 

y=y,=b)+B+b,8? +--+ +b,_,B77! 


yo = bo + $B + b26°B*? + +++ +b,_ 167" 'pt! (2) 


y= by 4 £418 4 b, 6 74- DB? ose +b,_ 647 M4 DBI! 
are all roots of g(x). By Lemma 4, we see that the numbers y,, y,,..., y, are all 
in L (we implicitly assume that L and E(q@) are contained in some common 
extension field). Multiply the i’th equation in (2) by £'~' and add all the resulting 
equations. Using Lemma 3, we find 
1 72 . 
Bp=—- petty, ]L. 
i=1 

Thus B ©€L 1 E(a)=M and B7=bELONE=M,. 

It remains to show that B generates M over M,. Let y © M. We can write 

Y=cyotec,Bt c,B7 + °°: +¢,_,B7"' c, GE. 

It will suffice to show that the coefficients c,; © E 1 L = M,. To do this one 

repeats the above argument to show that 
y,= Cot cfi IB t+ + 60>-DG-Dga-! 

isin L 0 E(a) =M for i = 1,2,...,q. Multiply y, by £*7~” and add up over i 
ranging from 1 to g. Using Lemma 3 once more we find 


q 
c,BK = Y cKO-Dy, © M. 
i=1 


Since B & M, it follows that c, © MO E = M, which completes the proof. 
This argument is so pretty and ingenious, one is lost in admiration! We are now 
ready to state and prove the main result. 


Theorem. If L/K is contained in a radical tower, then L/K is itself a radical tower. 


Proof: Suppose that E/K is a radical tower and that L C E. We have 
K=h,CHE,CHE,C ::: CE, =E 
where E,,, = Ey a, ), q, being a prime, and a, € E,. 
Now, consider the tower 
K=E,NOLGE,NALG::: CE, , ALE. (3) 
If £,,, AL =£, 0 L there is nothing that need be said. If £,,, A L#E,NL 
then Lemma 5 shows that E,,, (1 L/E; 0 L is a radical extension (of degree q;). 


Thus, after eliminating equalities, equation (3) demonstrates L as a radical tower 
over K.. 
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This completes the proof of Step 1 of Section 4. Since we proved Step 2 in the 
last section, the proof that the general equation of degree 5 or greater cannot be 
solved in radicals is now complete. 


Abel’s original proof of Step 2 is different from the one we have given, and we 
want to give an idea of how he did it. To do this we sketch Abel’s proof that any 
radical extension of K inside L has degree 2. We will assume, as Abel did, that we 
are dealing with the general equation of degree 5. 

Let K CK, CL and suppose K, = K(a) where a? = a € K and q is a prime. 
Let m be the number of distinct values that a takes on under the action of S, (i.e. 
the number of distinct conjugates of a). As we have seen, m is the degree of the 
irreducible equation for a@ over K. Since x% —a is irreducible it follows that 
m = q. Since |S.| = 120, q must divide 120, i.e. g = 2, 3, or 5. By Cauchy’s result, 
Fact 4 of Section 3, g cannot equal 3. Thus g = 2 or g = 5, and we must show 
q = 5 is impossible. Abel does this by first showing that the only fields between K 
and L of degree 5 over K are the fields K(x;), where i = 1,2,3,4,5. If gq = 5 we 
can then assume that K(a) = K(x,). By modifying a, if necessary, as in the proof 
of Lemma 2, we can write 

X,=aygta+a,a* +a,a° + a,a* a,E K. 
Applying the same technique as in Lemma 5 one shows 

a= 5(% + f~'x, + f~ 7x, + fx, + £~*x5) (4) 
where ¢ is a primitive fifth root of unity. The contradiction arises from the fact 
that under the action of S., a has five values, but the right hand side of (4) has 120 
values. 

Abel goes on to show that K, = K(VA) and that K(vVA) has no radical 
extensions in L. The proof of the latter assertion is similar to the one we have just 


given. If I am not mistaken there is a minor flaw in Abel’s proof that K, = K(vA ), 
but this is very minor and is easily corrected. 


SECTION 7. We conclude by indicating some further developments. In this 
section we continue to assume that we are in characteristic zero, but will no longer 
demand that roots of unity be in the base field. 

Abel was fascinated with the theory of equations. He published three articles on 
the subject, and a fourth appears among his posthumous work (see item XVIII of 
Volume II of his collected works [1]). He was at work on a major new memoir on 
this theory when he died, tragically, at the early age of 27. 

Having proved that the general equation of degree 5 or greater cannot be solved 
in radicals, the thrust of his later work was to find conditions on special equations 
which insure that they can be solved in radicals. His best known result in this 
direction is the following proposition. 


Proposition 1 (Abel). Let f(x) € k[x] and suppose that 0,,05,...,0, are its roots 
in some extension field of k. Suppose each 9, is a rational function of 0,, i.e. each 
6, = R(0,) where R(x) € k(x). Suppose further that for each pair i and j we have 
R,(R,(9,)) — R,(R,(9,))- 
Then f(x) = 0 is solvable in radicals. 
The reader can see that the hypotheses can be translated as follows. The 


splitting field of f(x) is generated by 6,, and the Galois group of f(x) is abelian. 
Thus the proposition is a consequence of Galois theory, though this is not, of 
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course, how Abel proved it. It is probably because of this result that Abel’s name is 
attached to groups in which the elements commute with one another. 

Abel never published a general criterion for when an equation is solvable in 
radicals, but in a letter to Crelle dated October 18, 1828 (see [1], Vol. 2), he writes 


“Si trois racines d’une équation quelconque irréductible dont le degré est un nombre premier, 
sont liées entre elles de sorte que l’une de ces racines puisse étre exprimée rationellement par le 
deux autres, |’équation en question sera toujours résoluble a l’aide de radicaux.” 


Roughly translated, this reads “If every three roots of an irreducible equation of 
prime degree are related to one another in such a way that one of them may be 
expressed rationally in terms of the other two, then the equation is solvable in 
radicals’. Abel gives no indication of how he came to this result, or how he proved 
it. It is remarkable in part because the statement is almost identical to one of the 
principal results of Galois’ fundamental memoir of 1830, which as we have already 
pointed out was not published until 1846. Here is Galois’ statement of the result as 
translated into English by Harold Edwards in [5]. 


Proposition 2 (Galois). In order for an irreducible equation of prime degree to be 
solvable in radicals it is necessary and sufficient that once any two of the roots are 
known, that the others can be deduced from them rationally. 


One can rephrase this result in more modern language. Let f(x) € k[x] be 
irreducible of prime degree, and 6,,6,,..., 08, be its roots. Given any three roots 
0, 9;,6,, there exists a rational function R(x, y) € k[x, y] such that 6,, = R(6,, 6,). 
Or, more simply, the splitting field of f(x) is generated by any two of its roots. 

Considering the simplicity and beauty of this result, it 1s somewhat surprising 
that it is not better known. A proof is outlined in the exercises to Section 8, 
chapter 4 of [6]. A complete proof can be found in Section 5 of Chapter 14 of [7]. 
Edwards [5] also gives a complete proof, but since he uses Galois’ original language 
it is somewhat hard to read. 

It might be objected that this criterion is not useful because the hypothesis is 
very difficult to check. As it turns out, it can be quite useful. In 1856, L. Kronecker 
proved the following interesting result (see [4]). 


Proposition 3 (Kronecker). Let Q be the rational numbers, and suppose f(x) € 
Q[ x] is an irreducible polynomial of prime degree. If f(x) = 0 is solvable in radicals, 
then either f(x) has exactly one real root, or all its roots are real. 


Kronecker’s proof uses the methods of Abel. He was clearly unaware of Galois’ 
work, since this proposition is an immediate corollary of Proposition 2. If 6, and 6, 
are any pair of real roots, and R(x, y) € Q[x], Then clearly R(6,, 0.) is also real, 
and so all the roots must be real. 

It is easy to use Proposition 2 to produce polynomials in Q[x] which are not 
solvable in radicals. For example, let g > 5 and p be primes, and a > 2 be an 
integer. Let f(x) =x’? — apx — p. By Eisenstein’s criterion, f(x) is irreducible. 
We claim it has exactly three real roots. For x large and negative, f(x) is negative. 
At x= -1, f(—1) = -1+p(a-1)>0. At x=0, f() = -—p <0. Finally, 
when x is large and positive, f(x) is positive. By the intermediate value theorem, 
f(x) has at least three real roots. However, f’(x) = qx?~' — ap has exactly two 
real roots, so f(x) must have exactly three real roots. By Kronecker’s result it 
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follows that f(x) = 0 cannot be solved in radicals. The simplest special case is 
x? — 4x — 2. 
It is of some interest to point out that a /-adic version of Proposition 3 is valid. 


Proposition 3/1. Suppose f(x) € Q[x] is of prime degree. Let Q, denote the field of 
l-adic numbers. If f(x) = 0 is solvable in radicals, then either exactly one root of f(x) 
is in Q, or all its roots are in Q). 


As before, the proof is an immediate consequence of Proposition 2. 
Here is an example of how to use this. Consider the polynomial 


f(x) =x? + 3x4 + 3x? + 6x? + 3x4 6. 


f(x) is an Eisenstein polynomial at 3 and so it is irreducible over Q. Considering 
f(x) modulo 2 we find 


f(x) =x txt +x2 +x Sx(x+1)(x?>+x+4+1) (mod 2). 


Since f(x) has exactly two roots modulo 2, both of which are simple roots, one can 
invoke Hensel’s lemma to conclude that f(x) has exactly two roots in Q.,. It 
follows from Proposition 3/ that f(x) = 0 cannot be solved in radicals. 

We conclude by using these ideas to answer the question; do there exist 
polynomials in Q[ x] of prime degree, all of whose roots are real, but which are not 
solvable in radicals? Here is an amusing way to show that the answer is yes. Let p 
and / be primes. Let f,(x), f(x), and f(x) € Q[x] be polynomials of the same 
prime degree g > 5 such that f,(x) is an Eisenstein polynomial at p, f,(x) has 
exactly two distinct roots modulo /, and f,(x) has q distinct real roots. Use the 
weak approximation theorem to find a polynomial f(x) € Q[x] which is p-adically 
close to f(x), l-adically close to f,(x), and close in archimedean absolute value to 
f,(x). Then, since f(x) is p-adically close to f,(x) it is irreducible. Since it is close 
to f,(x) in the archimedean absolute value, it has all its roots real. Finally, since it 
is /-adically close to f,(x), it has exactly two distinct roots in Q, and by Proposition 
3/ it is not solvable in radicals. 

All of this provides a nice, if simple, example of the fruitful way old and new 
mathematics can be combined to good effect. 
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The Great Marble Race: 
An Assignment Gone Wrong 


Benny Evans and Jerry Johnson 


As many educators have discovered, it is not always easy to produce computer- 
related activities that are accessible to students and actually contribute to their 
understanding of mathematics. Occasionally the most carefully designed assign- 
ments in the hands of students produce unexpected results. This is not necessarily 
bad—on the contrary, it may be a learning opportunity. The purpose of this note is 
to discuss an interesting example of this. 

Before we start, we will say a word about software. We use the computer 
algebra system DERIVE® in our classes and what follows reflects this fact. There 
are several other such programs on the market, most notably Maple™ and 
Mathematica®, with which the exercise we describe below can be analyzed as well. 
The most important thing is the mathematics, not the software, but users of other 
programs may observe some differences in the results. As we will see, the essence 
of the problem we discuss is structural, so any software should ultimately lead to 
the same difficulty. 


BACKGROUND. In 1990 the authors produced a book of computer-based activi- 
ties entitled Uses of Technology in the Mathematics Curriculum. Written under NSF 
erant USE 8950044, the book features substantial laboratory exercises whose 
solutions require a computer algebra system or other computational aid. They 
were the end result of our own experience, as well as that gained from participants 
at an NSF Workshop we held in 1989. This planning made us confident that we 
had anticipated most responses our students were likely to give... but not all. 
Here’s the story of what happened with one of these problems. 

The following is part of an exercise from our book that we assigned to students 
in Calculus II. The complete version actually begins with a “warm-up” which is a 
straightforward solved problem to provide a context for the computer commands 
necessary for the solution of the full exercise. We added necessary DERIVE 
instructions and we took some liberty—to be precise, the correct model is a 
frictionless bead sliding down a ramp, since a rolling ball picks up rotational 
energy. For a nice discussion and analysis of the Brachistochrone problem see [3]. 


MLRC ASSIGNMENT. MATH 2265 
THE GREAT MARBLE RACE: THE BRACHISTOCHRONE 


This is a marble race. We start our marbles at the origin and let them roll down 
a decreasing path to the point (7, —2). (We have chosen this particular point to 


DERIVE is a registered trademark of Soft Warehouse, Inc. 
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make our results nicer.) The trick is, we each get to select the path we want the 
marble to follow. I choose the Brachistochrone, given by (art — sin rt, cos wt — 1), 
0 < t < 1. Someone else has already chosen the straight line, (wt, —2t),0 <t < 1, 
and a third contestant has chosen the parabolic arc (at, 2(t — 1)? — 2),0 <t <1. 
You may select any curve you want. Just be sure your parametric curve decreases 
from the origin to (77, —2) as t moves from 0 to 1. 


1. (a) What path do you wish to use for the marble race? 

(b) For the path you have chosen, show that (x(0), y(O)) = (0,0) and 
(x(1), yO) = Ga, — 2). 

2. To decide who wins, we need to calculate the time required for a marble to 
roll down a path given by (x(t), y(t)), 0 < t < 1, from the origin to the point 
(a7, —2). The following formula is derived by equating the increase in kinetic 
energy gained by the marble with the potential energy it loses in getting to a 
spot on the curve a distance y below the x-axis. 


(x'(t)) + (OY 
—2y(t) 


where g is the constant of gravitation attraction near the surface of the 
earth. 

3. For each of the following curves, provide an integral that gives the time to 
finish. You should evaluate the integrals for the brachistochrone and the 
Straight line by hand, then ask DERIVE to approXimate the answer. 
DERIVE cannot evaluate the integral for the parabola in closed form. (If you 
issue the Simplify command, DERIVE will try for a while and then give up.) 
It is likely to have a similar difficulty with your curve and you will have to ask 
DERIVE to approXimate again. In each approximation, you may leave the 
gravitational constant g as a letter. For example, your answer may look like 
(1 + g)5.12345. Fill in the following table: 


curve integral exact time time to 6 places 


dt 


Brachistochrone 
straight Line 
parabola 


your curve. 


4. Who won the race? 

5. Based on the outcome of the marble race, formulate a conjecture about a 
physical property of the Brachistochrone. (Only clear English sentences are 
acceptable here.) 


NOTE: The full exercise continues with an analysis of the tautochrome property 


of the curve. 


THE PHILOSOPHY BEHIND THE EXERCISE. We liked this problem for sev- 
eral reasons. For one, the answer frequently runs counter to the intuition of many 
students that a straight ramp will win the race. It also encourages experimentation. 
Students can compute the time integral for different paths using DERIVE to do 
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the calculations, which would be prohibitive otherwise. The proof that the Brachis- 
tochrone is the path of least time is pretty challenging for calculus students, but 
this gives them the opportunity to test a few paths to find evidence for believing it. 


STUDENT RESPONSE. This is not an easy problem, and some students had 
difficulty coming up with a parametric representation of a path that decreases 
from the origin to (az, —2). Since we provided linear and quadratic paths we 
anticipated that they would probably choose a cubic path because it is the next 
polynomial degree. 

The first semester we gave the exercise, all the students did what we expected 
and came up with the corresponding times for these paths. The better students 
actually tried to justify the brachistochrone’s victory by giving some nice heuristic 
arguments about how the marble should accelerate quickly at the outset. The poor 
students produced the same kind of work they would without a computer. One 
made such a mess of the integral that it turned out to be a complex number, which 
DERIVE correctly calculated. The student dutifully reported the finish time as 
3 + 4i. 


THE ASSIGNMENT GONE WRONG. The next semester, a student tried some- 
thing that got him very excited. He came to Professor Johnson’s office to announce 
that he had chosen the path (at, —2yt) and obtained a finish time of 3.08830 
‘which is less than the Brachistochrone time of 77. (Remark: These answers should 
be divided by Vz , but we are only interested in a comparison, so we have 
suppressed this constant here and in all answers that follow.) 

We examined the student’s work for an error, but there was none. The student 
actually began to wonder out loud if he had uncovered an age-old mistake. At this 
point we cranked up DERIVE and took a look! 

The time integral cannot be evaluated in closed form and must be approxi- 
mated; moreover the time integral is improper (or singular) in the sense that the 
integrand approaches infinity as t approaches 0 (see Figure 4). DERIVE signals its 
uncertainty about the accuracy of an approximation by beeping and displaying the 
phrase “dubious accuracy” on the screen. The accuracy is indeed dubious, and the 
problem became clear. 

We changed DERIVE’s precision setting from the default of 6 digits to 10, and 
after a few seconds DERIVE produced an approximation of 3.154709288, a 
number that would look better to Bernoulli. (in the next section we’ll discuss why 
this was a futile thing to do.) This calculation took 30.5 seconds on our 386 33MHz 
machine, whereas the six digit approximation took 3.2 seconds. Nevertheless, we 
could simply instruct students to set the precision to 10 digits before beginning the 
exercise and warn them to be patient. But, another clever student used a segment 
of the ellipse (wt, —2V2t — t*). With 10 digit precision the ellipse gives a value of 
3.132361048 in 37.3 seconds, making the ellipse a winner over the Brachistochrone 
even with a 10 digit setting. 

It seems that the idea behind the exercise has been defeated, and the teacher 
may be placed in the awkward position of pronouncing the Brachistochrone to be 
the winner in spite of the evidence that students have gathered to the contrary. 


WHERE DO WE GO FROM HERE? Is this all DERIVE’s fault for providing 
poor approximations? As we will see, this is not the case. The “dubious accuracy” 
message is evidence of a structural impediment and increasing the precision 
settings, as we did above, is likely to fail. 
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1: (wt, -24-t (t - 2))] 

[wt — SIN (wt), COS (wt) - 1) 
(wt, - 2 Jt] 

(wt, 2t (t - 29] 


on f& Ww N 


"Four ranps from (0,0) to (1,-2)." 


COMMAND : Build Calculus Declare Expand Factor Help Jump soLve Manage 
Options Plot Quit Remove Simplify Transfer moVe Window approX 

Enter option 

User C:B3.MTH Free :36% Insert Derive Algebra 


Figure 1. Four parametric paths. 


A look at Figure 1 shows a DERIVE screen with four paths. #1 is the second 
student’s ellipse mentioned above, #2 is the Brachistochrone, #3 is the first 
student’s parabola, and #4 is the parabola we suggested in the exercise. Figure 2 
shows the graphs numbered accordingly (note the scale at the bottom). 

Why the difference among them in the accuracy of the corresponding time 
integrals? To help understand, look at Figure 3 which displays the integrands of 
the time integrals for the four paths. Figure 4 shows the graphs of these integrands 
numbered to correspond to the four graphs in Figure 2. 


COMMAND: 3ifTsk) Center Delete Help Move Options Plot Quit Scale Ticks Window 
2oon 

Enter option 

Cross x:1.6527 y:—-0 .6562 Scale x:0.5 y:6.5 Derive Z2D-plot 


Figure 2. Graphs of the four paths. 


We can see that three of the integrals are singular: the integrands approach © 
as x — 0. The approximation techniques rely on a finite number of function 
values, so if the function changes dramatically over any of the corresponding 
subintervals, accuracy is uncertain. The rapid growth of the function near the 
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wtt-2)-4t -2t + 1) 


60: 32 
(~ t (t - 2)) 
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2 


COMMAND: IWARSy Build Calculus Declare Expand Factor Help Jump soLve Manage 
Options Plot Quit Remove Simplify Transfer moVe Window approX 

Enter option 

User C: BRACH .MTH Free:71% Insert Derive Algebra 


Figure 3. The integrands of the time integrals of the four paths. 


COMMAND : Center Delete Help Move Options Plot Quit Scale Ticks Window 
Zoom 


Enter option 
Cross x:1 y:5 Scale x:6.25 y:2 Derive 2D-plot 


Figure 4. Graphs of the integrands of the time integrals of the four paths. 


singularity at the origin or the occurrence of a large percentage of the area under 
the curve near the y-axis contribute to this difficulty and make results of ‘dubious 
accuracy” more likely. A look at the cases where the two student answers were less 
than 7 (#1 and #3 in Figures, 1, 2, and 4) shows that this is happening. 

The computing time for most arithmetic operations grows quadratically with the 
precision. Increasing DERIVE’s precision enables more refinement which further 
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increases the computation time and memory consumption. This can improve the 
approximation for a given function, but it is easy to produce functions that will 
defeat any precision setting. See [2] for more on these matters. 

We were encouraged by students who chose curves that drop very steeply and 
then quickly flatten out, giving a large acceleration at the origin. However, this 
forces the time integral to get a large part of its value near the singularity. The 
more students try to increase acceleration in this way, the more trouble the 
computer algebra system is likely to have approximating the time integral. 

We can also make this situation as bad as we want by changing parameteriza- 
tions. Let’s look at the parabolic arc #4 in Figure 1 and consider the parameteriza- 
tions (awt’, 2(1 — t”)* — 2) where r is a small positive number. The graphs of the 
integrands of the time integral for r = 0.1,0.2,...,1 are displayed in Figure 5, and 
descend as r decreases. Once again, please note that the curves in Figure 5 are not 
the ramps that our marble rolls down, but the graphs of the functions we integrate 
to get the time of descent. These integrals all represent the time down the same 
parabolic ramp and so must have the same value, but DERIVE produces dramati- 
cally different approximations (with “dubious accuracy” messages where appropri- 
ate). 


COMMAND: ‘Mise) Center Delete Help Move Options Plot Quit Scale Ticks Window 
Zoon 


Enter option 
Cross x:0.4965 y:6 Scale x:0.25 y:2 Derive 2D-plot 


Figure 5. Integrands for different parameterizations of a parabolic arc. 


As r gets smaller the integral takes more of its value from a region near the 
singularity, x = 0. Thus, an approximation determined by a regular partition will 
become successively poorer. Using the default 6-digit precision, DERIVE produces 
an approximation of 2.98912 for r = 0.4 and 1.41820 for r = 0.1 with “dubious 
accuracy” warnings. In the latter case the time is less than half the Brachisto- 
chrone’s. Thus, the idea of increasing DERIVE’s precision is defeated by a 
sufficiently small choice of r. 


1995] THE GREAT MARBLE RACE 511 


For Riemann integrals, there are ways to determine bounds on the error in an 
approximation, such as those given by Simpson’s rule and the Trapezoidal rule, but 
if we ask our computer algebra system to approximate an improper integral, we 
have no way to assure the accuracy of the answer. This is not the fault of the 
computer algebra system. No matter what general approximation technique we 
employ, there is a fairly straightforward integral that will defeat it. See [2] for more 
on these matters. What are we to do? 


RESOLVING THE PROBLEM: GIVE YOUR COMPUTER ALGEBRA SYSTEM 
SOME HELP. There are techniques that can help approximate improper integrals 
in general. First, one should determine that the integral converges before turning 
it over to a computer algebra system. If the integral diverges, there is no point in 
appealing to the computer. 

If the integral extends over an infinite interval, a substitution can be made to 
change it to a finite interval. For example, 

oo if(—Inx) 00 if(x?) 

[ f(x) & i —— dx or J f(x) ae pf ar ee, 
(DERIVE does this automatically, but if we leave the process up to our computer 
algebra system, we have no control over what the transformed integral looks like.) 

Once we have an improper integral over a finite interval our best strategy is to 
transform it into a Riemann integral if that is possible. For a large class of 
functions this is easy. For example, if f(t) is continuous on (0,1] such that 
{o f(t) dt converges and lim,_,, t’f(t) is finite for some r with 0 < r < 1 (which is 
true for algebraic functions), then the substitution ¢ = x'/'~” will transform the 
improper integral into a Riemann integral. 

By the way, in the case of our time integral, making a substitution in the integral 
is the same as changing the parameterization of the path—a good exercise for a 
calculus student. We see this in the next example. 

Consider the parameterization of the parabolic arc (mt!/!°,2(1 — ¢!71°)* — 2). 
This yields the time integral 


1 1 fa? + 16(¢'7° — 2t'7 + 1) 
[f(y at where f(t) = 55\) pong — ay 


As was noted earlier, DERIVE provides a “dubious accuracy” warning and the 
unsatisfactory approximation 1.41820 for this integral. It is important to emphasize 
that if we did not know in advance to expect the answer to be larger than 77 we 
might have no idea how bad this approximation really is. Since lim, _, , t’°/?°f(t) is 
finite we should make the substitution t = x7° to yield the Riemann integral 


mw? + 16(x* — 2x* + 1) 


for which DERIVE gives the approximation 3.27633. It is also instructive to plot 
the graph of the integrand, which suggests that the integral is larger than 77. 

This trick may also work for transcendental functions. For example, using its 
6-digit precision, DERIVE provides the approximation 0.886195 for {jv — Int dt. 
Notice that lim,_,, ¢!/?vV— Int is finite, and hence the substitution t = x* pro- 
duces the integral (i2V2x/— In x dx for which DERIVE gives 0.886239. As it 
turns out, the original estimate was not too bad, but we had no way of knowing this 
before we transformed the integral. 
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Of course there are integrals that do not satisfy the above criterion. For 
example, if f(t) = ¢~'e~¥-™* then [o f(t) dt converges to 2 (DERIVE will find 
the antiderivative in closed form), but lim, _,,+ t’f(t) = © for allO <r <1. (We 
leave the proof to the reader.) The substitution t = x” for a large value of r may 
still be advisable since it shifts area under the integrand away from the singularity, 
but in general there may be no recipe to tell us how to transform such improper 
integrals into Riemann integrals. 


THE ERROR. Approximations without error bounds are not worth much. Thus, 
while DERIVE’s approximations for the transformed Riemann integrals were 
improved in the examples we examined, we can’t be sure of this in general. 
According to its User Manual, DERIVE employs ‘‘an extrapolated adaptive 
Simpson’s rule” to approximate Riemann integrals. What this means is that 
DERIVE adjusts the distribution of partition points in its application of Simpson’s 
rule. For more discussion of this method, see [1]. Simpson’s rule has a standard 
error bound in terms of the maximum of the absolute value of the fourth derivative 
on the interval in question. In the two examples discussed above the derivative 
tends to ~ at one of the end points. Thus, we appear to be back to square one. Do 
we have reason to accept any of the approximations that DERIVE has produced? 

In general, this is a serious problem that has no easy answer, but in special cases 
there is help. If f(x) is monotone on the interval [a,b], and if the interval is 
partitioned into n equal subintervals, then the difference between the upper and 
lower Riemann sums is | f(b) — f(a)\(b — a)/n, so the average of the upper and 
lower Riemann sums (the trapezoidal rule) differs from the integral by no more 
than | f(b) — fla)b — a)/2n. 

Therefore, in simple cases we may apply the trapezoidal rule on intervals where 
the function is monotone. It is easy to calculate the required Riemann sums with 
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DERIVE and to decide what value of m ensures the desired error bound. Let us 
look once again at the parameterization of the parabolic arc discussed above. With 
DERIVE’s help we find that the function 


we + 16(x* — 2x7 + 1) 
2—x? 


has a single extremum on the interval [0,1] at x = 0.853490. If we wish to get an 
answer with error less than 107° we apply the trapezoidal rule on the inter- 
val [0, 0.853490] with n = 1300 and on the interval [0.853490, 1] with n = 400. 
DERIVE produces the value 2.83576 + 0.440535 = 3.27630 and we finally have a 
number whose accuracy is understood. 


CONCLUSION. This problem provides a natural setting where an improper (or 
singular) integral arises and a context for such questions as “What are the 
difficulties involved in approximating such integrals?” ‘“‘How does the parameteri- 
zation affect the integral?” ‘“How can you help the computer produce estimates 
that you can believe?’ When we discussed these topics in class, we found an 
interested audience because some of them had seen the difficulties. 

In this example students see that even the mighty computer does not provide an 
easy solution to every calculation, and it is not facility with software or formulas, 
but rather the ability to bring appropriate mathematics to bear that will solve 
problems. In particular, it shows that computation may lead one astray and that in 
the end, a mathematical proof is the essential tool. 

The unforeseen problem with this exercise was turned into a positive learning 
experience which probably makes it better, not worse, than we believed when we 
assigned it. The things that a modern computer algebra system can do are striking, 
and they provide mathematics educators with exciting tools that simply have never 
been available before, but it may be that the things a computer algebra system 
cannot do are just as important for pedagogical reasons. 
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Geometry and the Foucault Pendulum 


John Oprea 


Nature uses only the longest thread to weave her patterns, so each small piece of 
fabric reveals the organization of the entire tapestry. 
—Richard P. Feynman 


§1. INTRODUCTION. In 1851 Jean Foucault (1819-1868) built a pendulum con- 
sisting of a heavy iron ball on a wire 200 feet long to demonstrate the rotation of 
the Earth (see Figure la and Figure 1b). Foucault observed that such rotation 
would cause the swing-plane of the pendulum to precess, or rotate, as time 
went on, eventually returning to its original direction after a period of T = 24/ 
sin U, hours (where v, denotes the latitude where the experiment takes place). 


Figure la 


In a recent New York Times interview [Ang], the distinguished scientist and 
author Stephen Jay Gould proclaimed, “I’ve never understood why every science 
museum in the country feels compelled to have one of these [a Foucault pendu- 
lum]. I still don’t understand how they work and I don’t think most visitors do 
either.” Gould is exactly right. Non-physicists generally have only the vaguest 
notion of how the behavior of the pendulum relates to the rotation of the Earth. 
The usual quite complicated analysis of this phenomenon of precession is in terms 
of rotating reference frames and the Coriolis force (see [Sym] and [Arn]). While 
these notions are part of elementary mechanics, they are not widely known among 
even mathematically aware non-physics students. 
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Figure lb. Path of the Pendulum 


The purpose of this article is to present the behavior of the Foucault pendulum 
as a Simple consequence of doing Calculus on the sphere. This holonomy approach 
to the pendulum is mentioned in [W-S] and [Mar p. 16], but the details in terms of 
elementary Calculus do not seem to be well known. We believe this analysis of the 
Pendulum deserves a wide audience because it provides a beautiful down-to-‘ Earth’ 
example of mathematical modelling in the context of Geometry and Calculus. 

While we only discuss the pendulum, the geometric concept of holonomy makes 
its presence felt in applied mathematics from optimal control to quantum mechan- 
ics (cf. [En1], [En2] and [W-S]). It is hoped that the mathematical description of the 
Foucault pendulum presented here will spur interest in applications of Differential 
Geometry and will be accessible to any student acquainted with multivariable 
calculus and a touch of linear algebra. 


§2. THE SPHERE. Our first step in analyzing Foucault’s pendulum is to under- 
stand the geometry of the sphere. Consider a sphere (denoted by S*) of radius R 
with patch 


x(u,v) = (Rcosucosv, Rsinucosv, Rsinv), 


where 0 <u < 27 and — 3<v < 5. By ‘patch’ we mean a system of coordinates 
on the sphere, such as spherical coordinates (p,0,@) with a fixed radius p = R. 
Note however that our patch differs from spherical coordinates in that v repre- 
sents the latitude on the sphere; that is, the angle up from the equator, not down 
from the North Pole (see Figure 2). 

The patch x has two special families of curves associated to it: the longitudes 
B(v) = x(uy,v) obtained by setting u equal to a constant and the Jatitudes 
a(u) = x(u, v,) obtained by setting v equal to a constant. Since these curves are in 
R?, their tangent vectors a’ and f’ are given by differentiating each coordinate of 
their expressions. For latitude and longitude tangent vectors respectively, we have 


a’ = (—Rsin u cos vg, R COS u COS Ug, 0), 
B' = (—Rcosu,sinv, —Rsin uy sinv, Rcosv). 


Note that the dot product a’ - B’ is zero, so that a’ and #’ are perpendicular (or 
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Figure 2 


a TT << \ 
et Hi ha 


Figure 3. Tangent Plane with Basis E1, E2 


orthogonal) for all u and v. In particular, a’ and B’ form a basis for the tangent 
plane TS * where p = x(Uy, U,). That is, every tangent vector w at x(u,v) may be 
written in a unique way as w = Aa’ + Bp’ for some real numbers A and B (see 
Figure 3). 

This basis for the tangent plane may be extended to a basis for R° itself by 
taking a vector perpendicular to both a’ and B’; namely, the cross product a’ X fp’. 
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In fact, things become simpler if we take unit vectors in the directions of a’, B’ and 
a’ X B’ obtained by dividing these vectors by their lengths |a’|, |B’| and |a’ x Bp’. 
The vectors of our basis are now, 

a’ B' 


E,= iw) (—sin u,cos u,0) E, = Bi = (—cos usinv, —sin usin v, cos v) 


a’ X Bp’ 


= ——— = (cCOSUCOSU,SINUCOS U,SiINV). 
la’ X B’ ( 3 ? ) 


The basis {E,, E,, U} provides a framework for comparing Euclidean geometry 
of R? to geometry seen from the perspective of a 2-dimensional resident of the 
sphere. Because the perceptions of such a person are restricted to the 2-dimen- 
sional space spanned by E, and E,, any event or object in R° is ‘seen’ by the 
resident of the sphere only through its projection onto the tangent plane. In 
particular, a vector w in R* may be written uniquely as 


w=ak,+bE,+cU 


but the resident of the sphere only sees aE, + bE,. The viewpoint described here 
is useful in forming analogies between Euclidean geometry and curved geometry. 
For example, in R* we know that lines, which may be parametrized by y(t) = p + tu 
for fixed p and v, are shortest paths between points. Further, from the parametri- 
zation, it is clear that lines are characterized by having zero acceleration vectors. 
By analogy, ‘shortest paths’ (or geodesics) on the sphere are characterized by 
having zero acceleration vectors as perceived by residents of the sphere. That is, any 
curve on the sphere with an acceleration vector entirely in the U-direction is a 
geodesic. Such curves on the sphere turn out to be the great circles. In the next 
section we carry this viewpoint further. 


§3. PARALLEL VECTORS ON THE SPHERE. What does it mean to say that two 
tangent vectors on the sphere in different tangent planes are parallel? It definitely 
cannot mean, in general, that the two vectors are parallel in R*. For consider a 
latitude circle on the sphere S* at latitude v, 


a(u) = (Recosucosvu,, Rsinucos v,, Rsin vy). 


It is easy to compute that, in R*, a'(0) may be written as 
T T 
a’'(0) = —Rsinv, cos vy E>, v6] + R cos’ vy u(5 ; v6] 


with respect to the basis {E,, E,,U} at a(>). The non-zero U-component shows 
that no vector of the tangent plane at a(3) is R°-parallel to a’(0). 

One way to compare vectors along a curve y(t) in R? is to start with a tangent 
vector V, at y(O) and create a field of tangent vectors V(t) at y(t) which is 
differentiable in t. The rate of change in vectors along y may then be computed as 
(d/dt)V(t). Further, we may say that a vector field V is parallel along y if 
(d/dt)V(t) = 0 for all t. Of course this then implies that V(t) = V,, a constant, 
and this fits with our notion of parallelism in R°. 

We may extend this idea in a simple way to a tangent vector field V(u) along a 
latitude circle a(u) in S* by saying that V is parallel along a if (d/du)V(u) has no 
E(u) or E(u) components. This means that (d/du)V(u) = C(uw)U(u) for all u or, 
equivalently, that the projection of (d/du)V(u) onto the tangent plane at a(w), 
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Proj752(d /du)V(u), is zero. We may think of this as saying that residents of the 
sphere see no change in vectors along a. (For readers versed in differential 
geometry, note that we may avoid the covariant derivative here because a is a 
constant-length u-parameter curve and Vu) is given in terms of u. Thus, covariant 
differentiation in R°, which is coordinatewise directional differentiation, reduces 
to ordinary differentiation d/du.) 

To return to our latitude circle, let V(u) be a parallel vector field along the 
latitude a(u). (We always assume that vectors are tangent to S*.) Then we may 
write V(u) = A(u)E(u) + BCu)E,(u). The first thing we notice is 


Lemma. V has constant length. 


Proof: Because V is parallel, (d/du)V(u) = C(w)U(u) and therefore, 


d d 
GVH) Vw) = 25 (uw) Vw) 


= C(u)U(u) - V(u) 
= 0. 


Since V - V is constant, so is |V]. O 


From our expression for V(u) we see that we must have A(u)* + B(u)? = 
\V|? = L? where L is a constant. Therefore we may write A(u) = L cos 0(u), B(u) 
= L sin 6(u) where @(u) is the angle from V(u) to E(u). We then have 


V(u) =Lcos 6(u) E(u) + Lsin 6(u)E,(u). 
From this expression it is clear that, in order to compute (d/du)V(u), we must first 


compute (d/du)E(u) and (d/du)E,(u). We do this coordinatewise. 


me (—cos u, —sinu,0) lu 


The reader may check that, in terms of the basis {E,, E,, U} we have 


E, = (sin usin vy, — COS U SiN Uy, 0). 


Proposition. 


d 
Tu! = sinv, EL, — cosu,U Wa? = —sinv,F,. 
u u 


Remark. Note that the Proposition says that neither FE, nor E, are parallel along 
a. 


The second thing we notice is that parallel vector fields always exist. In fact, the 
proof of this standard (but essential) result tells us precisely how vectors rotate to 
maintain parallelism. 


Theorem. Let V, be a tangent vector at a(0). Then there exists a parallel vector field 
V along a with V(O) = Vy. 


Proof: The expression above for V(u) shows that a prospective parallel vector field 


V is determined by the angle @(u). The condition that V be parallel will translate 
below into a complete determination of 0(u), thus constructing the desired V. The 
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product and chain rule give 


d dé d dé d 
aw = —sin OF! + COS OF + COS rma) + sin OF er 


Using our previous calculations of the derivatives of FE, and E, along a, we obtain 


d _ of, dé | dé 
aw = —sin O|sinv, + Tu E, + cos 6}sin vy) + Tu E, — cos @ cos vy U. 


Because a parallel V cannot have E, or E, components, and since sin 6 and cos @ 
cannot be zero simultaneously, we must have d@/du = —sin Vy or 


6(u) = @(0) — [sin Uy, du 


= 6(0) — usin vo. 


This formula then defines 6 and, hence, the parallel vector field V. O 


Definition-Proposition. The angle of rotation as u varies from 0 to 27 is called the 
holonomy along a. By the proof of the Theorem above, the holonomy along a is 
given by 


— 27 Sin Up. 


Remark. Of course, all of this may be done in complete generality. Standard (and 
very good) references on Differential Geometry are [O’N], [Spi] and [DoC]; general 
results on parallelism and the covariant derivative may be found there. 

The calculation of holonomy above says that parallel tangent vectors rotate by 
— 2 sin vU, as they move completely around a latitude circle. Of course, as the 


fi 
pox} 


f 
a 
pee Fy 


ime 
“SST 


SO 


Figure 4. A Parallel Vector Field on the Sphere 
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terminology ‘parallel’ signifies, 2-dimensional residents of the sphere see the 
vectors as parallel—so, from their viewpoint, not rotating at all. This may seem 
contradictory since the angle between V(u) and E(u) is changing with u, but it 
must be remembered that the vector field FE, along a is not parallel, so any angle 
change may be attributed to the direction change of E,. In fact, the product rule 
guarantees that two parallel vector fields along a curve maintain the same angle 
between their constituent vectors. 


Exercise. What happens at the Equator and why is the Equator special among the 
circles of latitude? 


§4. THE FOUCAULT PENDULUM. In order to analyze the Foucault pendulum 
from the viewpoint of geometry, assume the Earth to be non-rotating and the 
pendulum to be situated at latitude v,. Instead of the Earth rotating to move the 
pendulum, we move the pendulum once around the latitude circle in 24 hours at 
constant speed on this stationary Earth. This is clearly equivalent to the standard 
situation. The long cable of the pendulum and the slow progression around the 
latitude circle have two consequences (which are the usual physics arguments). 

First, the long cable provides a relatively small swing for the pendulum which is 
then approximately flat. Hence, we may consider each swing as a tangent vector to 
the sphere. By orienting these vectors consistently, we obtain a vector field of 
pendulum swing plane directions V. At each moment of time ¢ there is such a swing 
direction vector V(t) and all these vectors may be placed along the latitude circle 
a(u) by associating a given moment of time ¢ with the unique point describing the 
pendulum’s movement along a(u). Hence we write V(u) for the swing plane vector 
field. 

Secondly, because we move around the latitude circle slowly, the consequent 
centripetal force on the pendulum is negligible (= 1/290) compared with the 
downward force mg. That says that the only force F felt by the pendulum is in the 
normal direction U. Thus, the vertical swing plane of the pendulum experiences no 
tangential force and so appears unchanging to a 2-dimensional resident of the 
sphere. That is, projected to the tangent plane TS’, 


dV(u) 0 

du” 
where the covariant derivative again reduces to the ordinary derivative due to our 
special parametrization. By our earlier discussion, we then have 


Projy-s2 


Theorem. The vector field V associated to the Foucault pendulum is parallel along a 
latitude circle. 


Of course, aS we transport the Foucault pendulum once around the latitude 
circle a, holonomy rotates the parallel vector field V by —27sinv, radians. 
In particular, the angular speed of this vector rotation is then w = (277 sin v, rads / 
24 hours). The equivalence of our geometric situation with the physical one then 
gives 


Theorem. The period of the Foucault pendulum’s precession is 
27 rads 24 


= — hours. 
Ww SIN Uy 
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Of course, this is precisely the period obtained in physics. Here, however, the 
precession of the swing-plane of the Foucault pendulum results from the holonomy 
along a induced by the curvature of the Earth. Further, since we view the whole 
pendulum apparatus as stationary relative to the Earth, what can explain the 
observed precession of the swing-plane? As Foucault argued, we must have 


Corollary. The Earth rotates along its latitude circles. 


Exercise. Suppose a Foucault pendulum is transported around a latitude circle on 
a torus. (You should still assume the only force is normal to the torus.) Compute 
the holonomy and explain whether this experiment alone can tell you whether we 
live on a sphere or torus. 


Remark. While we have treated the pendulum because of its relative simplicity, a 
similar type of analysis can be made for one of the most useful of optimal control 
devices, the gyroscope. Indeed, in 1852 Foucault built a very refined gyroscope 
whose precession also demonstrated the Earth’s revolution. Foucault, in fact, 
coined the term gyroscope from the Greek gyros meaning ‘circle’ and skopein 
meaning ‘to view’ because his gyroscope allowed him to see the rotation of the 
Earth. For more on gyroscopes see [Sca] for example. 

In its own simple way, this mathematical analysis of the Foucault pendulum 
epitomizes the physics of the 20" century—a physics which takes a decidedly 
geometric view of Nature. 
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Areas of Polygons Inscribed in a Circle 


David P. Robbins 


1. INTRODUCTION. Since a triangle is determined by the lengths, a, b, c of its 
three sides, the area K of the triangle is determined by these three lengths. The 
well-known formula 


K = ys(s —a)(s—b)(s—c), (1.1) 


where s is the semiperimeter (a + b + c)/2, makes this dependence explicit. (This 
formula is usually ascribed to Heron of Alexandria, c. 60 BC, although some 
attribute it to Archimedes.) 

When I was in about 7th grade I worked out Heron’s formula for myself by 
drawing an altitude and using two instances of the Pythagorean theorem. (I was 
unaware of this elegant factored form above.) My fascination with the way 
symmetry entered the formula has stayed with me for many years. 

For polygons of more than three sides, the lengths of the sides do not determine 
the polygon or its area. However, if we impose the condition that the polygon be 
convex and cyclic, (i.e., inscribed in a circle) then the area of the polygon is 
uniquely determined. Moreover, it is a symmetric function of the side lengths. The 
symmetry can be seen by regarding the polygon as the union of isosceles triangles 
each bounded by two radii and an edge of the polygon. From this point of view, we 
see that changing the order of the sides leaves the area unaffected. Given positive 
real numbers a,,...,a,, One can construct a convex n-gon with the a,’s as the 
lengths of the sides provided that the largest a, is smaller than the sum of the 
remaining ones. In this case it is also possible to construct a convex cyclic n-gon 
with the same sides and this cyclic m-gon has the largest area of all n-gons with the’ 
given side lengths. The monograph of Coxeter and Greitzer [1, pages 56-60] 
contains an interesting discussion which renewed my interest in the subject when I 
was teaching geometry at Phillips Exeter Academy almost 20 years ago. 

In particular the reader will find in [1] a formula analogous to (1.1), given by 
Brahmagupta in the seventh century, for the area K of a cyclic quadrilateral whose 
four sides have lengths a, b, c, and d. It is 


K=y(s —a)(s — b)(s —c)(s — d), 


where again s is the semiperimeter (a + b + c + d)/2. Having read this section 
of [1] made me wonder what the formulas would be for polygons of more sides and 
I have worked sporadically on the problem since then. 

In this article I will present formulas, analogous to those of Heron and 
Brahmagupta, for the areas of the cyclic pentagon and cyclic hexagon. For a more 
detailed exposition see [2]. 

It may seem surprising that so long a time has elapsed between the discovery of 
the formula for the area of the cyclic quadrilateral and the one for the cyclic 
pentagon. We shall see that the calculations leading to the discovery of the 
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pentagon formula are so complex that it would have been quite difficult to carry 
them out without the aid of a computer. In fact after some study of the problem I 
thought it likely that, even if I were to discover the formula, its complexity would 
make it of little interest to write down. However it is possible to write the formulas 
for the areas of the cyclic pentagon and the cyclic hexagon in a compact form 
which is related to the formula for the discriminant of a cubic polynomial in one 
variable. 

A number of colleagues have made helpful suggestions, some of which have 
been incorporated in the exposition below. I would like particularly to acknowl- 
edge the contributions of Bradley Brock, Russell Kulsrud, David Lieberman, 
James Maiorana and Lee Neuwirth. 


2. AN ALGEBRAIC FORMULATION OF THE PROBLEM. The convexity condi- 
tion for polygons is algebraically unnatural. It is simpler to consider the following 
slightly generalized problem. Given positive real numbers a,,..., a,, find the areas 
of all n-gons whose side lengths are a,,...,a, and whose vertices lie on a circle. 
Here an n-gon is a sequence of n points P,,..., P, in a plane. Its n side lengths 
are the distances P,P,,...,P,_,P,, P,P,. Note that these polygons need not be 
convex and may intersect themselves. Let us define the area of a planar polygon 
whose vertices are P, = (x,, y,),...,P, = (%,, y,) to be 


| 


Defined this way the area is the sum of the areas of the components into which the 

polygon divides the plane, with each component weighted by the winding number 

of the polygon about a point in the component. The area can be negative and its 

sign changes when the polygon is traversed backwards. However the new formulas, 

like those of Heron and Brahmagupta, involve only the square of the area. 
Heron’s formula can be restated 


X, Vy 
Xy 2 


X2 2 
X3  Y3 


+ eee tf 


} (2.1) 


Xi, Vy 


16K* = 2a*b* + 2a*c* + 2b*c* — at — b4 — c* (2.2) 


so that 16K? is equal to a polynomial with integer coefficients in the squares of . 
the sides of the triangle. 
Brahmagupta’s formula is 


16K* = 2a*b* + +++ +2c*d* — a* — b* — c* — d* + 8Babcd 
so that 16K? is equal to a polynomial in the side lengths in which the exponents of 
each term are either all even or all odd. 


If in the right-hand side of Brahmagupta’s formula a single side length is 
replaced by its negative, one obtains the equation 


16K? = 2a*b* +--+ +2c?d* — a* — b* — c* — d* — 8abcd. 


This has a natural geometric interpretation. It gives the area K, in the sense 
described above, of a nonconvex cyclic quadrilateral of side lengths a, b, c, d. 


3. CONJECTURES ON AREAS OF CYCLIC POLYGONS. In this section I will 
describe some general conjectures about cyclic polygons. I still do not know how to 
prove them but the understanding they provided made it possible to discover the 
correct formulas for the cyclic pentagon and cyclic hexagon whose fairly simple 
rigorous proofs are indicated below. 
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Notice that Heron’s formula states that 16K? satisfies a monic polynomial of 
degree 1, namely one of the form u — o where a is the right-hand side of (2.2). Its 
(two) coefficients are symmetric polynomials with integer coefficients in the 
squares of the sides of the triangle. 

In the next section we shall see that, for a cyclic pentagon, 16K” satisfies a 
monic polynomial of degree 7. Each of its coefficients is a symmetric polynomial 
with integer coefficients in the squares of the sides of the pentagon. 

More generally there are reasons (indicated below) to believe that, for a cyclic 
polygon of 2m + 1 sides, 16K? satisfies a monic polynomial of degree A,,, where 
the sequence 


A,,A,,A3, 4,4, A5,... = 1,7, 38, 187, 874,... 
is defined by 


A, = Tm - re ‘| - 5|(2m + (?”"| - 2m 


and that its coefficients are symmetric polynomials with integer coefficients in the 
squares of the sides of the polygon. 

For polygons with 2m + 2 sides there is an analogous conjecture that 16K? 
satisfies one of two monic polynomials each of degree A,,. Both of these polynomi- 
als have coefficients which are themselves symmetric polynomials with integer 
coefficients in the side lengths. In these symmetric polynomials every monomial 
consists entirely of even powers or entirely of odd powers of the side lengths. 
Moreover, the two monic polynomials are closely related. Either can be obtained 
from the other by replacing any single side by its negative. We have already 
observed this for cyclic quadrilaterals and we shall see that it also holds for the 
hexagon. 

Some parts of these conjectures are easily proved. For example, let us see why 
some algebraic relation always exists. It is helpful to employ a presentation of the 
problem in terms of complex numbers that will also be useful in the proofs of the 
formulas for the pentagon and hexagon. Assume that a polygon has its vertices on 
a circle centered at the origin in the complex plane. Suppose that these vertices are 
in order v,,...,v, and that the radius of the circle is R. Also let v,,, =v, and- 
define the quotients 


qj = Vj41/¥jp, J =1,...,0. 
Then, letting a; be the distance from vu, ‘to v,;,,, we have 
a* =|D44- p,|° = R*(v,,, — v,)(1/0;4, — 1/v;) = R?(2 —q;- q;'). (3.1) 
Using the definition (2.1) of the area of a polygon we have 


K = (1/2) }} Im(i,0;.,;) = (1/41) DR? (0541/0; — ¥;/0;41)- 


j=l j=l 


Hence 

—16K? = R*(v,/v, — 03/0, + +++) = R4(q, + °° +4, -a7$ - ~qy'). 
(3.2) 

It follows that the n+ 1 quantities 16K’ and a,j =1,...,n, are rational 

functions of the n + 1 variables R and q,,...,q,. But the q,’s satisfy the relation 
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dq, ‘'* 4, = 1 and are therefore algebraically dependent over the rational num- 
bers. Hence the functions 16K? and a* must themselves be algebraically depen- 
dent over the rationals. 

The reason for believing that the degree of the polynomial for (2m + 1)-gons is 
A,, is that A,, appears to be the largest number of distinct areas that can occur 
with a given set of side lengths. 

Suppose that n = 2m +1. It seems that the maximum number of areas is 
achieved when the n side-lengths are distinct but nearly equal. Imagine a circle of 
variable radius and let us try to inscribe a polygon with sides of the given lengths in 
the circle by picking an arbitrary starting point and laying out the edges, one at a 
time, with the given lengths. When the radius is too large, we will not reach the 
Starting point when we have used up all the sides. As we decrease the radius there 
will come a time when we return exactly to our starting point. The resulting 
polygon will be nearly the regular polygon with n sides. If we continue to decrease 
the radius, we will overshoot the starting point starting to go around the circle 
again. When the radius has decreased enough, we will go around the circle exactly 
twice, creating a star. We can continue this way finding radii requiring more trips 
around the circle yielding stars with sharper points. However no edge can go as 
much as halfway around the circle so that the maximum number of times we can 
go around is m. This is where the first m areas come from. 

There are other solutions. These arise as follows. We have so far assumed 
(implicitly) that, as we lay out all the sides around the circle, we are always 
proceeding in the same direction. But this is not necessary. We can lay down one 
of the sides in the opposite direction. Then we get a solution which looks 
something like a (2m — 1)-gon because the backwards edge almost coincides with 
the preceding and following edges. Here we have 2m + 1 choices for the back- 
wards edge and for each of these choices we can still go around the circle m — 1 
times. In general each choice will require a different radius and yield a different 
area. 


2m-+ 1 


This explains the term (m — D{ in the formula defining A,,. Subsequent 


terms are explained by selecting more, up to m — 1, of the sides to go backwards. 

Diagram 1 illustrates the seven cyclic pentagons with side lengths 29, 30, 31, 32, 
33. 

For polygons with an even number n = 2m + 2 of sides, it appears that the 
maximum number of areas is obtained if we take 2m + 1 of the sides to be distinct 
and nearly equal and the last side to be very small. We can then construct A,, 
solutions with the 2m + 1 sides in the same orientation as above and the very 
small side proceeding in the same direction as the majority. We can also construct 
another A,, solutions with the very small side proceeding in the opposite direction. 
Thus we have a total of 2A,, solutions, in agreement with the form conjectured 
above for the formula for the area of cyclic (2m + 2)-gons. 


4. AREA OF A CYCLIC PENTAGON. Before getting to the details of the for- 
mula, I would like to present some indication of the process by which it was 
discovered. 

I suspected from the outset, by analogy with the case of triangles and cyclic 
quadrilaterals that, for a cyclic pentagon, 16K’ would satisfy a monic polynomial 
whose coefficients were symmetric polynomials in the squares of the sides of the 
pentagon. Considerations like those in the previous section led me to believe that 
the polynomial probably had degree 7. One way to check the conjecture was to 
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Cyclic Pentagons 
Side Lengths: 29, 30, 31, 32, 33 


: r= 16.512 


r = 26.385 
= 17, 
r= 17.026 r = 17.595 
r= 17.991 
r = 18.335 : 
r= 18.651 
Diagram 1 


choose 5 integer-valued but nearly equal sides for which all seven solutions as 
described above could be realized. I then computed, with (high precision) ap- 
proximate arithmetic, the seven areas K of the seven pentagons described in the 
preceding construction and then formed the monic polynomial with the corre- 
sponding seven values of 16K? as roots. If my conjecture was correct, I expected 
this polynomial to have integer coefficients (or near integer coefficients since the 
arithmetic was approximate.) It turned out that the coefficients of this polynomial 
were always nearly integers, as predicted. 

An elaboration of this method for confirming this conjecture also leads to a 
computation of the correct polynomial assuming it exists. It seemed sensible to 
express the coefficients of the powers: of 16K* in terms of the elementary 
symmetric functions 


O,=arts:: +a, 
O, = ara5 +++: +aiaz 
of a7,..., a. Thus, taking into account the homogeneity properties of the desired 
polynomial, it would be of the form 
u’ + (co; + C,a)u° + (c30; + C40; 05 rot yur rot 
where u = 16K? for brevity and c,, cy, c3,... were certain integer constants to be 


determined. For a given pentagon with integral sides, this polynomial could be 
computed exactly (rounding the near integer coefficients to the nearest integer.) 
Also the o,’s were easily found. Thus each such example gave 7 linear equations 
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satisfied by the c,’s, one equation for each power of u. A simple enumeration 
shows that only 70 unknown c,’s are involved in the most complicated coefficient 
(which is the constant term). With 70 examples (and a little luck so that the 
resulting systems of equations were nonsingular) it was possible to solve for the 
unknown coefficients. 

This is how I found the formula in the first place. Many additional checks were 
available. One interesting check was that the computed c,’s, which were sure to be 
rational from the computation method, were in fact all integers. Observe however 
that even though I was virtually certain that the formula was correct, I really had 
no proof since the formula was based on approximate arithmetic and a conjecture. 

At first glance the formula looked like a random polynomial of 153 terms. 
However a little inspection showed that the polynomial had one very striking 
feature: every integer coefficient, and some were quite large, factored into very 
small primes. For example one coefficient was 27°. This suggested that the 
polynomial had some additional structure. 

By examining the polynomial carefully and manipulating it with the help of the 
computer program Mathematica, it turned out that it could be rewritten in a much 
more compact form as follows. 


Cyclic Pentagon Area Formula. Suppose that a pentagon inscribed in a circle has 
side lengths a,,...,a;. Let o,,...,05 be the elementary symmetric functions in the 
squares of the sides and let u be 16 times the square of its area. Also define 
t,, 03,04, 05 by 


t,=u-—40,+ 0; 
t, = 80,+ 0,1, 
t, = —640, + t3 
t; = 1280,. 

Then the area of a cyclic pentagon satisfies 


ut, + tit? — 16t3t, — 18ut3t,t, — 27u°t: = 0. (4.1) 


Note that, after substituting the expressions for the ¢,’s in (4.1), the first term is 
monic of degree 7 in u and the other terms have smaller degrees in u. Hence, the 
formula yields a monic polynomial of degree 7 for u whose coefficients are 
polynomials in the squares of the lengths of the sides. The largest real root of this 
polynomial is 16 times the square of the area of the convex pentagon with the 
given side lengths. 

Having discovered the Formula (4.1), I did not at first understand its signifi- 
cance. Some time later Bradley Brock pointed out the extremely interesting fact 
that the left side of (4.1) resembles the discriminant of a cubic. He was right. It 
turned out that it is precisely 1/(4u*) times the discriminant with respect to z of 
the cubic polynomial 


z?4+ 2t,z7 — utyz + 2u’ts. 


Why this should be the case is still a mystery. Also it should be emphasized that 
the quantities ¢,, t,,¢;, which are (essentially) the coefficients of the mystery cubic, 
arose out of trying to make sense of the computed formula. They must have some 
separate significance. 
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The vertex quotient formulation together with this observation about the 
discriminant leads to a relatively simple proof of the formula which is within range 
of being carried out entirely by hand. (In fact a computer was used.) Recall the 
vertex quotients q, and the relations (3.1) and (3.2). Let 7,,...,7; be the 
elementary symmetric functions in the q,’s and note that 7; = q, --* q; = 1. From 
(3.1) symmetric functions of the a ;S are symmetric functions of the q,’s and, from 
(3.2), 16K* is a symmetric function in the g ; Ss: Hence all the quantities ¢,, t, t4, t. 
and u = 16K? can be expressed in terms of T1>-+-,74. We easily find 


16K? = —R*(7, - T,4)°. 


Also a fairly routine but lengthy calculation, which has been omitted, yields 


t, = 4R*(-10 + 37, — 7, — 73 + 374) 

t, = —4R°(37, — 7, 4+.73 — 374)(7, — %) 

t, = 16R8(97, — 7, + 73 — 974)(7, — 72 +73 — 7%) 
t, = —128R'°(7, -—71,+73- t4) 


Note that each of ¢t,;, t,, and t, factor as a product of two linear functions in the 
T’s. This may be a hint for explaining the meaning of the t’s. 
It is now easily verified that 


z> + 2t3z* — utyz + 2u7t, = | z — 16R°(7, - 74)'| 


2 
Since the cubic has a double root, we may conclude that its discriminant is 0, 


proving the formula. 


5. AREA OF A CYCLIC HEXAGON. Similar methods can be used to find the 
formula for the area of cyclic hexagons. Strangely the formula can be obtained 
from the pentagon formula by making a slight change in the definition of the t’s. 


Cyclic Hexagon Area Formula. Suppose that a hexagon inscribed in a circle has side 
lengths a,,..., a, and let o,,...,05 be the first 5 elementary symmetric functions in 
the squares of the sides and og be the product of the six sides and let u be 16 times the 
square of its area. Also define t,,t,t,,ts by 


t, =u —40,+ 06; 
t, = 80, + o,t, — 160; 
t, = t5 — 640, + 640,04 
t; = 1280, + 32t,06. 
Then the area of a cyclic hexagon satisfies either 
ut, + t3t? — 16¢3t, — 18ut,t,t, — 27u*t2 = 0 (5.1) 
or the equation obtained by replacing of by its negative. 


The reader may wonder how the squared area of a given hexagon decides which 
of these two equations to solve. The answer is to look at the product 


p=(1—4)°**(1— 4), 
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where q;’s are the vertex quotients. Since each q; has absolute value 1, and the 
product of the q,’s is 1, it is easily verified that p is always real. A squared area is a 
root of (5.1) when p > 0 and a root of the alternate form if p < 0. In particular 
the convex case yields a root of (5.1). 


6. AREA OF A CYCLIC HEPTAGON. These methods could be used in principle 
to derive the (degree 38) formula for the cyclic heptagon squared area. However 
the computations would be of rather heroic proportions, requiring for some of the 
coefficients the solution of a system of linear equations with 143307 unknowns. 
Perhaps someone can guess an answer like the compact formulas for the pentagon 
and hexagon, which might then be provable with a simple argument as above. 
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PICTURE PUZZLE 
(from the collection of Paul Halmos) 


They have the same name. Well, not quite. 
(see page 537.) 
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Curves of Constant Precession 


Paul D. Scofield 


1. INTRODUCTION. Given initial position and direction, the flight-path of a ship 
in Euclidean space is completely determined by how much it turns and how much 
it twists at each odometer reading. This is an intuitive interpretation of the 
Fundamental Theorem for Space Curves, which states that curvature « and torsion 
T, as functions of arclength s, determine a space curve uniquely up to rigid motion. 
This statement of the Fundamental Theorem ((14], §1—8) should be tempered with 
the reservations expressed by Nomizu [12] and Wong & Lai [15]. 

Given a parametric space curve, there are well-known formulae for the arc- 
length, curvature, and torsion (as functions of the parameter). Given two functions 
of one parameter (potentially curvature and torsion parametrized by arc-length) 
one might like to find a parametrized space curve for which the two functions are 
the curvature and torsion. This activity, called “solving natural equations” ({14], 
§1-10), is generally achieved by solving Riccati equations like dw/ds = —ir/2 — 
ikw + itw?/2. 

Although the solution generally exists, it usually cannot be obtained explicitly. 
Euler [6] found explicit integral formulae for plane curves (where + = 0) through 
direct geometric analysis. Hoppe [9] developed a general method for solving the 
natural equations for space curves by solving Riccati equations through a compli- 
cated sequence of integral transformations. He digressed to obtain formulae for 
the tangent, normal, and binormal indicatrices for general helices and essentially 
for curves of constant precession. Enneper [5] obtained explicit closed-form 
solutions for helices on revolved conic sections through direct geometric analysis. 

A curve of constant precession is defined by the property that as the curve is 
traversed with unit speed, its centrode revolves about a fixed axis with constant 
angle and constant speed. In this paper we obtain an arclength-parametrized 
closed-form solution of the natural equations for curves of constant precession 
through direct geometric analysis. As part of this analysis, we obtain a new 
theorem for curves of constant precession analogous with Lancret’s Theorem for 
general helices. We provide the first rendering of a curve of constant precession. 
We also note for the first time that curves of constant precession lie on circular 
hyperboloids of one sheet and have closure conditions that are simply related to 
their arclength, curvature, and torsion. These are 3-type curves, except one family 
of closed 2-type curves (when w = ¥3 py; see [2], [3], and [1)). 

Given a closed C? curve in space, it is rather obvious that the curvature and 
torsion functions will be periodic functions of the arclength, with period equal the 
total arclength. This is a necessary condition but, as the circular helices (« and 7 
both constant) show, not a sufficient condition that integral curves be closed. 
Efimov [4] and Fenchel [7] independently formulated 


The Closed Curve Problem. Find (explicit) necessary and sufficient conditions that 
determine when, given two periodic functions x(s) and 7(s) with the same period L, 
the integral curve is closed. 
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This natural problem in elementary differential geometry remains open, despite 
implicit solutions by Schmeidler [13] and Hwang [10]. Fenchel warned that there 
may be no simple solution. Our investigation of curves of constant precession 
began in an effort to find closure conditions for some collection of pairs of simple 
periodic functions Jike k(s) = w cos ws and r(s) = w sin pws. 


2. PLANE CURVES. Here we set out Euler’s well-known integral solutions of the 
natural equations for plane curves ({14], p. 26). We will designate coordinates and 
geometric invariants of plane curves by subscript 7r. Identifying the angle between 
the tangent line to the curve and the x-axis as 


Q, = [Qn ds__., 


it follows that 


x= i cosy,ds, and y_= J sin o_ ds_ 


solve natural equations of the form 


K,=k,(s,) and 7, =0. 


7 


If we change a constant of integration, we rotate or translate the curve. 

Still, it is a rare curve for which both «x is a simple function and the above 
integrals can be evaluated in closed form with elementary functions. Among the 
simplest are the circle, the logarithmic spiral, the circle involute, and the epicycloid 
([14], pp. 26-28). Enneper [5] showed that each of these is the projection along the 
axis of symmetry of a curve of constant slope (helix) on a conic surface of 
revolution: a circular cylinder, a cone, a paraboloid, and a sphere. 


3. CURVES OF CONSTANT SLOPE (HELICES). Here we set out the integral 
solution of the natural equations for curves of constant slope or general helices 
((14], pp. 33-35), and we set out an explicit parametrization for spherical helices, 
never appealing to the solution of a Riccati equation. A curve of constant slope or 
helix is defined by the property that the tangent makes a constant angle 6 with a 
fixed line /. We have its natural equations by 


The Theorem of Lancret [11]. .A necessary and sufficient condition that a curve be 
of constant slope is that the ratio of curvature to torsion be constant. 


In proving the theorem, it is observed that the constant slope and the constant 
ratio are related by 


k/7 = tan 6, constant. 


Taking / as the z-axis, it is easy to observe that dz = cos @ds. Moreover, the 
projection of the curve onto the xy-plane has arclength element ds, = sin 6 ds and 
curvature xk, = «x csc’ @ (relating the radii of a helical osculating circle and the 
planar osculating circle of its projection). Then using Euler’s planar solution, 


?, = [kn as, = osc 0 [x ds, 
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SO 


x(s) = sin of cos| ese 6 ['«(s2) as, ds, 
0 0 


y(s) = sin of sin} es 6 '(s2) as, ds, 
0 0 


z(s) = S Cos 6. 


General helices are precisely the geodesics on general cylinders generated by lines 
parallel with 7. A general cylinder is the rectifying developable of its helices. 

We will want a parametrization for spherical helices because the tangent 
indicatrix of a curve of constant precession will prove to be a spherical helix. In 
anticipation, we will designate the coordinates and arclength of spherical helices by 
subscript t. Struik ({14], pp. 34-35) shows that for a helix on a sphere of radius r 
making an angle @ with the z-axis, the projection onto the xy-plane is an 
epicycloid with fixed radius a = r cos @ and rolling radius b = r sin* (6/2). Substi- 
tuting these into his epicycloid parametrization (p. 27), we obtain 


r 1 4 r 1 4 1 + cos @ 
x = —(1 + cos @)cos # — —(1 — cos 6 )cos ——_——— 
CW) = 5( Joos  — 5 ( )cos=——— 
r 1 9)si r 1 9)si 1 + cos 6 
= —(1+ cos @)sin W — —(1 — cos é)sin-——-__— 
16 cos 6 
Zz = r sin 8 cos —————_,, 
CW) 1 — cos 0 
where 
cos 6 
S, =r sin 6 tan 6 cos ————___y,, 
1 — cos @ 
cos 0 
Ss, =rtan 6 cos —————yW. 
1 — cos @ 


The spherical helix has an arc of length 2r tan 6 between heights z = +rsin 0 
beyond which no tangent to the sphere makes an angle as small as 6 with the 
z-axis. The parametric extension gives a sequence of arcs which join in cusps at 
their endpoints. This piecewise smooth curve is closed if and only if cos @ is 
rational. All arcs of a spherical helix with cos @ = 8/17 are rendered in Figure 1. 


4. CURVES OF CONSTANT PRECESSION. Here we characterize curves of 
constant precession. We will denote the moving orthonormal frame of tangent, 
normal, and binormal vectors by t, n, and b, and we will differentiate with respect 
to arclength, using the Frenet equations 


t’ = Kn 
n =-xt +7b 
b’ = —Tn 


((8], [14], §1-6). Let C = rt + xb denote the centrode, the Frenet frame’s axis of 
instantaneous rotation ((14], §1-6, Exercise 18, and [7]). Fix arbitrary constants 


w > 0, uw, and a = ¥w*? + pw’. Set A= C + wn and fix the line / parallel with 
A(O). We use Z(e,e) to denote the angle between two vectors. 
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Figure 1. Sixteen arcs of a helix on the unit sphere with cos 6 = 8/17. They form the tangent indicatrix 


of the curve in Figure 2. (Visualization assisted by The Geometry Center at the University of 
Minnesota.) 


Lemma. The following are equivalent: 
Gi) |C| =a 
(ii) Z(C,A) = cos! — 
(iii) |n’| = w 
(iv) Z(n, A) = cos™! — 


(v) |Al =a. 


Proof: Since |C|* = x? + 7? = |n'|* and |Al* = x? + 7? + p’, it is clear that (i), 
(iii) and (v) are equivalent. Interpreting (ii) as 


Ww 
Ke+77=CAZ= —VK + 77 VK? +77 4+ Ww? 
Q 
implies that (i) is equivalent to (ii), and interpreting (iv) as 
vr 
w=n:A= —In| {Al 
Q 
implies that (iv) is equivalent to (v). Q.E.D. 


Lemma. Given any of (i)-(v), the following are equivalent: 
(vi) |C’] = |wop| 
(vii) A is parallel with I. 
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Proof: Since A = C’ + pn’, 
N=028C = Fyn’ & (C| = lp In’. 
Thus, it follows from (ii) and (v) that (vi) and (vii) are equivalent. Q.E.D. 


A curve of constant precession is defined (somewhat redundantly) by the 
property that, as it is traversed with unit speed, its centrode revolves about a fixed 
line / in space (the axis) with constant angle and constant speed. As a conse- 
quence, its Frenet frame precesses about /, while its principal normal revolves 
about / with constant complementary angle and constant speed. We have its 
natural equations by the following analogy with Lancret’s Theorem. 


Theorem 1. A necessary and sufficient condition that a curve be of constant 
precession is that k(s) = wsin ws and 1(s) = wcos ws, up to reflection or phase 
shift of arclength, for constants w and w. 


Proof: Conditions (v) and (vii) are true if and only if A = 0, but 
A = (7 — pxK)t + (kK + uT)b 
and uniqueness of solutions of pairs of linear equations imply that A = 0 if and 


only if K(s)=wsinws and r(s)=wcos ps (up to reflection or phase shift). 
Q.E.D. 


5. SOLVING THE NATURAL EQUATIONS. Here, without solving a Riccati 
equation but using results from Sections 3 and 4, we obtain an arclength 
parametrization for curves of constant precession. Condition (iv) of the lemmata in 
Section 4 implies, since t’ = «xn, that t is a curve of constant slope (hence a helix on 
the unit sphere). We take k = +w sin ps and continue to designate the tangent 
indicatrix by subscript t. Arclength along the curve and along its tangent indicatrix 
are related by 


ds, 
—_ =K= +@SiIN US, 
ds 
SO 
_@ 
s,= #—cosps+C 
jo 


Taking the lower signs, C = 0, and a = |A| = Vw? + pw’, while substituting r = 1 
and cos 6 = 4 /a into the formula for s, in,Section 3, we obtain 


w pL 
Ss, = — cos Wp 
pL a— pL 
hence 
1 
s= p, 

a p 

giving a remarkably simple reparametrization 


ate a 
x'(s) =x,(s) = 50 cos(a@ — wh)s — 


a+ 
y'(s) =y,(s) = 


— 
cos(a + p)s 


, Me ., 
sin(a — w)s — sin(a + w)s 


a 


67) 
z'(s) =z,(s) = — cos ps. 
64 


1995] CURVES OF CONSTANT PRECESSION 535 


Theorem 2. An arclength parametrization of a curve of constant precession with 
natural equations k(s) = —w sin ws and r(s) = w COs ws is given by 


atpsin(a-—p)s a-yp sin(at+p)s 


x(s) = ——_—_— - 


2a a-p 2a atp 
atp cos(a-—p)s a-yp cos(at+yp)s 
s) = —- ge 
y(s) 2a ap 2a atp 
Ww ° 
z(s) = —sinps 
[La 


where w, gb, and a= Vw? + p* are constant. Moreover, the curve lies on the 
circular hyperboloid of one sheet 


I 
x*+y*- —z7= —. 
The curve is closed if and only if w/a is rational. 


A curve of constant precession is rendered in Figure 2. The tangent indicatrix, a 
spherical helix, has cusps where «(s) = —w sin us = 0. 


Figure 2. A curve of constant precession with w = 15 and pw = 8, shown on its circular hyperboloid. It 
is an integral curve of the indicatrix in Figure 1. 
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Don’t talk to me of your Archimedes’ lever. He 
was an absentminded person with a mathematical 
imagination. Mathematics commands all my re- 
spect, but I have no use for engines. Give me the 
right word and the right accent and I will move 
the world. 


Joseph Conrad 
Preface to A Personal Record. Garden City NY: Doubleday, 
Doran and Co. Inc., 1929, p. xiil. 


Answer to Picture Puzzle 


(p. 530) 
Carl Ludwig Siegel and Grahame Segal. 
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Matrix Expansion by Orthogonal 
Kronecker Products 


Jeffery C. Allen 


The singular-value decomposition (SVD) provides an expansion of a real M x N 
matrix A by orthogonal outer products [3, page 144]: 


K 
A = YS, U,V. 
k=1 


The singular values are ordered s, >5s,> +*: =>5S, >= 0, where K = min{M, N}. 
The u,’s and the v,’s are the orthonormal left and right singular vectors: 


1 k=k' 
wluy = viv = {4 k# k?? 


where u’ denotes the vector transpose. This note demonstrates that by a simple 
rescanning of the matrix A, the SVD can also produce a variety of expansions in 
terms of orthogonal Kronecker products. 


NOTATION. Vectors are denoted by a bold lowercase letter. The “vec” of a 
matrix is the vector that results by stacking the columns. For example, 


Xy 

x, | 
Mat(M, N) denotes the linear space of real M X N matrices. BlkMat(M,, N,; M5, 
N,) denotes the linear space of real M, X N, block matrices with M, x N, 
blocks. The inner product of two matrices is given by (U,V) = vec(U)/vec(V ). 


The Kronecker product of two matrices is U®V =[u,, ,V] [3, page 243]. For 
example, 


vec([x,,X2]) = 


U,V U,V 


@V= , 
Uy Vo Uy WV 


ie Ui 2 


Ur, U2 


Note that in U ® V, every element of U multiplies every element of V. Thus, the 
Kronecker product contains the same products as the outer product vec(V )vec(U ). 
The map taking Kronecker products to outer products is the rescanning function. 
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THE RESCANNING FUNCTION. Let M,, M,, N,, and N, be positive integers. 
Use these to define the mapping block: Mat(M,N,, M,N,) > BlkMat(M,, M,; 
N,, N,) 


A, Ay, +1 a Avy, -1)M, +1 
A, Ay, +2 ov An, 1M, +2 

block([a,,...,ayww]) =]. , , , , 
Ay, Aju, a An, 


where each block A, is a real M, X N, matrix determined from a, by vec(A,) = 
a,. Thus, block merely rescans a matrix of size M,N, X M,N, into an M, XN, 
block matrix with M, x N, blocks. It is straight-forward to verify the following: 


(B-1) block is a linear mapping which is one-to-one and onto. 
(B-2) block preserves the matrix inner product. 
(B-3) block maps outer products onto Kronecker products. 


To illustrate the last claim, let U € Mat(M,, N,) and V € Mat(M,, N,). Then 
A=U®V isa M, XN, block matrix with M, x N, blocks Ay, 4(n, 1, = Um, nV 
for m = 1,...,M, and n = 1,..., N,. In matrix form, this can be written 


block(vec( V )vec(U) " ) =U8V. 


Since block is an isometry mapping outer products to Kronecker products, it 
should come as no surprise that block lifts the SVD expansion to the Kronecker 
expansion. 


THE KRONECKER EXPANSION. Let A € Mat(M, N) where M = M,M, and 
N=N,N,. Let K = min{M,N,, M,N,}. Then there are matrices U,,...,Ur © 


Mat(M,, N,), matrices V,,...,V, € Mat(M,, N,), and numbers 5s, > --: >5,> 
Q such that 
K 
A = > Ss, U, ® V, 
k=1 


where the following orthogonality conditions hold: 


1 k=k' 
(Us, Ue) = Vis Yed = {4 aan 


Proof: By (B-1), block is invertible. Set A = block” '(A). Write the SVD of matrix 
A in the form 


K 


A = > SV, Uy 
k=1 


Let U, € Mat(M,, N,) be determined from vec(U,) = u,. Likewise, let V, € 
Mat(M,, N,) be determined from vec(V,) = v,. Then by (B-1) and (B-3), the 
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Kronecker expansion is obtained 


K 
k=1 
By (B-2), it is straight-forward to verify the orthogonality of the U,’s and the V,’s. 


K 
A = block(A) = )) s, block(v,u;, ) 
k=1 


Remarks. Since the SVD also determines optimal reduced rank approximations, 
best approximations using a fixed number of Kronecker products can be obtained 
from this Kronecker expansion. The motivation for this Kronecker expansion or 
“block” SVD arose in an image-processing application. Illuminating discussions of 
image processing and applications of the SVD, block matrix computations, and 
Kronecker products are found in Gonzalez and Wintz [6] or Jain [5]. An excellent 
treatment of the Kronecker product is found in Horn and Johnson [3]. Further 
generalizations of the Kronecker product and signal-processing applications are 
found in [4], [7], [2], [1]. 
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Injective Polynomial Maps 
Are Automorphisms 


Walter Rudin 


This article presents a simple elementary proof of the following result. 
Theorem A. If F: C” — C” is a polynomial map which is one-to-one, then 


(a) F(C”) = C”, and 
(b) F~!: C” > C" is also a polynomial map. 
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Here n is a positive integer, and C” is the set of all z = (z,,...,z,,), each z; 
lying in the complex field C. In general, the notation ®: X — Y indicates that ® is 
a map whose domain is X and whose range lies in Y. To say that F is a polynomial 
map means that F = (f,,...,f,,) and each component f; of F is a polynomial, 
mapping C” into C. 

Theorem A may be regarded as a small step toward a confirmation of the 
so-called Jacobian conjecture, which claims that if F: C” — C” is a polynomial 
map whose Jacobian is a non-zero constant, then F is a polynomial automorphism 
of C”, i.e., F is one-to-one and satisfies (a) and (b). This dates back to 1939 [5] but 
is still unproved (in June 1994), even for n = 2. Its history, many references, and 
some partial results, can be found in [2]. 

Theorem A shows that the Jacobian conjecture would be proved if one could 
show, for polynomial maps F: C” — C”, that “locally one-to-one” implies “glob- 
ally one-to-one.” This formulation of the problem points to an interesting differ- 
ence between C” and R”: Serguey Pinchuk [8] has (surprisingly!) constructed a 
polynomial map F: R* — R? whose Jacobian has no zero in R* but which is not 
one-to-one. The difference is, of course, that on R” there are nonconstant 
polynomials without zeros, whereas this cannot happen on C”. 

Theorem A is not new. In [7] Don Newman proved (a) with R? in place of C”. 
In [3] this was extended to R”, for arbitrary n, with the aid of a good dose of 
homology theory; that paper also contains a brief sketch of the analogous result for 
maps from k” to k”, for arbitrary algebraically closed fields k. Ax [1; Th. 2] 
extended this to morphisms of algebraic varieties, using nonprincipal ultraproducts 
of fields. Theorem (2.1) on p. 294 of [2] lists eight (mostly algebraic) conditions on 
polynomial maps F that are equivalent; Theorem A is one of those equivalences: 
F is one-to-one if and only if F is an automorphism. 

I believe that the proof given here is much simpler than any of the above. 
(Proof: I have no trouble understanding it.) It uses two facts from complex 
analysis: 


Fact 1. Jf G) u,v: C” > C are polynomials with no common factor of positive 
degree, 

Gi) Q is an open subset of C", and 

(iii) v(p,)) = 0 at some point py in ©, 
then © contains points p at which v(p) = 0 but u(p) # 0. 


This must be prehistoric. A proof can be found on pp. 14, 15 of [11]. Note that it 
fails on R”. 


Example: u(x, y) =x* + y’, v(x, y) =x? 4+ (y — x)’. 


Fact 2. If F satisfies the hypothesis of Theorem A, then the Jacobian of F is # 0 at 
every point of C". 


This is in fact true for holomorphic maps from open sets in C” into C” that are 
locally one-to-one, and it used to be a fairly difficult theorem (see, for instance, [6; 
pp. 86—88]) until Jean-Pierre Rosay published a truly simple proof [9]. 

Combined with the inverse function theorem (Th. 9.24 in [10]), Fact 2 implies 
what will actually be used, namely: 


The range F(C”) of F is an open subset of C”. 
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(Remark: That F(C”) is open is also an immediate consequence of Brouwer’s 
“Invariance of Domain” theorem, concerning continuous one-to-one maps from 
R” into R™ [4; p. 95] but that theorem is much more difficult than the route via 
Fact 2.) 

We now start the proof. 

Let f,,...,f,, be the components of F, and let k be the subfield of C 
generated by the coefficients of the polynomials f;. Since k is countable, there are 
only countably many polynomials with coefficients in k. The union of their 
zero-sets (ignoring the zero-polynomial) is thus a countable union of closed sets 
without interior, hence cannot cover the complete metric space C”. It follows that 
there is a point € in C”, fixed from now on, with the following property: 


. If f: C” > C is a polynomial with coefficients in k, and f(é) = 0, then 
(*) f(z) = 0 for every z in C”. 


Put 7 = F(é). 


Claim. The extension fields 


k(n) = k(1,--+5 Mn) 


and 


k(n, &) = Km, -- +5 Mas & 19+ ++ En) 


are equal. 


Here k() is the smallest subfield of C that contains k and 7,,...,7,, and 
similarly for k(, &). 

If the claim is false, there is an isomorphism @ of k(n, €) into C that fixes every 
element of k(7) but moves some é;. (See the lemma at the end of the paper.) Put 


w = (9(€,),---, 9(&,)) 


and note that w # &. 
Since f(€) =n, is in k(m) and the coefficients of f; are in k, we have, for 
1<j <n, 


FE) = PFE r+ &n)) = L( PCED)» CE) = FC). 


Hence F(é) = F(w), which contradicts the assumption that F is one-to-one. This 
proves the claim. 

In particular, each €, is in k(7). This means that there are polynomials u,, v,, 
with coefficients in k, and without common factors of positive degree, such that 
v(m) # 0 and 


€, =u,(n)/vU,() (l<j<n). (1) 
Thus €v,(F(é)) — u,(F(é)) = 0. Property (*) implies now that 
z,(F(z)) =u,(F(z)) (l<j<n,zeC"). (2) 


Put 0 = F(C”). We saw, as a consequence of Fact 2, that © is open. If v; hada 
zero in (1, Fact 1 would imply that there is a point in where v,; = 0 but u,; + 0, 
contradicting (2). 
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Hence v;° F:C” — C is a polynomial without zeros, hence is constant, hence 
each v; is constant. Without loss of generality, v; = 1. Putting 


G = (U,,...,U,), (3) 
(2) becomes 
G(F(z)) =z forall z inC”. (4) 


Hence F(G(F(z))) = F(z). This says that Fo G is the identity map on (2. If two 
polynomials agree on (), they agree on C”. Thus 


F(G(w)) =w_ for all w in C”. (5) 
The theorem follows from (4) and (5), with F~! = G. 


Lemma. Suppose that F is a subfield of C, &,,...,&, are in C, and F, = 
F(E,,...,&,,). Then either F, = F, or there is an isomorphism o of F, into C 
that fixes every element of F but moves at least one é;. 


Proof: Assume ¥, # Y. Then there is a nonempty subset of {&,,...,é,,}, say 
(€,,..., €;) (after reordering) that is minimal with respect to the property 


F, = F(E,,...,§&). 
Put .F, = F(é,,...,§_,). (This is F when j = 1.) Then 
FC F, S F,(§;) ~ F. 


Let ¢ fix every element of .¥, and choose ¢(€,) as follows: 

If €, is transcendental over F,, let p(&;) be any complex number + €, that is 
also transcendental over ., (such as 1 + &). 

If €, is algebraic over Y,, with minimal polynomial p(x), let o(€;) be another 
root of p(x). 

To every w in Y, corresponds a rational function r, with coefficients in F,, 
such that w = r(é,). Setting o(w) = r(g(€;)) gives the desired isomorphism. 
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An Elementary Proof of the Simplicity 
of the Mathieu Groups M,, and M,, 


Robin J. Chapman 


In this note I prove the simplicity of the Mathieu groups of prime degree, M,, and 
M,,, using no group theory beyond Sylow’s theorems and basic facts about 
permutation groups. The only facts about the groups M,, and M,, which are 
needed are their orders, and the fact that they are transitive permutation groups 
on 11 and 23 letters respectively. Most textbooks dealing with the Mathieu groups 
prove the simplicity of M,, by more complicated arguments. For instance Rotman 
[1], uses a lemma of Burnside whose proof lies beyond the scope of introductory 
courses on group theory. 

Let p be a prime number, and G be a subgroup of S,,, the symmetric group of 
degree p. It is easy to see that p||G| if and only if G is transitive, ie., if 1 <j, 
k <p then there is o © G with o(j) = k, for the only elements of order p in S,, 
are the p-cycles. We shall assume that G is transitive, and by replacing G by a 
conjugate if necessary we may also assume that G has P = ((12::-:: p)) asa 
Sylow p-subgroup. Let n =|G|, m, be the number of Sylow p-subgroups of G, 
and rg, be the index |N,(P): P|, where N,(P) is the normalizer of P in G. As all 
Sylow p-subgroups of G are conjugate in G then 


n =|G| =|PI|NG(P): PIIG: N6(P)| = prgmg. 


By Sylow’s third theorem mg, = 1(mod p). Also P < N,(P) < N,(P) and Ns (P) 
is the group of all affine transformations modulo p, i.e., the set of maps of the 
form 


x+>ax +b (mod p) 


where p t a. Hence |Ns (PJ = p(p — 1) and so rg =|N.(P): P| is a factor of 
p — 1. It follows that ro is the least positive residue of n/p modulo p. The 
following lemma forms the basis of our proof of simplicity. 


Lemma 1. Let G be a transitive subgroup of S,,, and suppose mg > 1. Then rg > 1. 


Proof: Suppose mg > 1 and re = 1. Then G has exactly m((p- D=n-—mg 
elements of order p. Each of these elements has no fixed points on {1,2,..., p}. 
Hence G has at most mg elements with fixed points. Each stabilizer G, of 
j € {1,2,...,p} in G consists of m, elements having at least one fixed point. It 
follows that G; = G, = -:: =G,,, the set of all elements of G with fixed points. 
This means that G, is trivial and so m, = 1 contrary to hypothesis. O 


We can now prove the simplicity of an interesting class of groups. 


Theorem 1. Let G be a transitive subgroup of S,, and suppose \|G| = pmr where 
m > 1, m = 1 (mod p), r < p and r is prime. Then G is simple. 
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Proof: We must have rg =r and mg=m. Let H be a non-trivial normal 
subgroup of G. It is easy to see that the orbits of H on {1,2,..., p} are permuted 
by G. As G is transitive and A is non-trivial all the orbits of H must have the 
same size s > 1, so s =p and H is transitive. It follows that P’ < H for some 
Sylow p-subgroup P’ of G. By Sylow’s second theorem all Sylow p-subgroups of G 
are conjugate in G, and so H contains all Sylow p-subgroups of G. Hence 
m,, =m and |H|= pmt where t|r. But t > 1 by the Lemma and as r is prime, 
t=r,H=GandGissimple. O 


We recall briefly some facts about the Mathieu groups, M,,, M,,, M,,., M,, and 
M,,. These were the first sporadic simple groups to be discovered—by Mathieu in 
1861 and 1873—and are most easily defined as automorphism groups of certain 
combinatorial structures known as Steiner systems. For instance M,, is the 
automorphism group of the (unique) Steiner system of type S(4,5, 11)—this is a 
collection of 5-element subsets of an 11-element set X with the property that each 
4-element subset of X is contained in exactly one of the sets in the system. 
Similarly M,, is the automorphism group of the (unique) Steiner system of type 
S(4, 7,23). For more details see chapter nine of Rotman’s book [1]. Rotman finds 
the orders of these groups; in particular |M,,| = 7920 = 2*-3*-5-11 and 
|M,3| = 10200960 = 2’- 3*-5-7-11- 23. 


Theorem 2. The Mathieu groups M,, and M,, are simple. 


Proof: The group M,, is a transitive subgroup of S,, of order n = 7920. Now 
n/p = 720 = 5 (mod11) so rg = 5 and mg = 144 > 1. By Theorem 1 M,, is 
simple. 

Similarly the group M,, is a transitive subgroup of S,, of order n = 10200960. 
Now n/p = 443520 = 11 (mod 23) so rg = 11 and m, = 40320 > 1. By Theorem 
1 M,, is simple. O 


From the simplicity of M,, and M,, it is easy to deduce the simplicity of M,, 
and M,, (see Corollary 9.22 in [1)). 
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UNSOLVED PROBLEMS 


Edited by: Richard Guy & Richard Nowakowski 


In this department the MONTHLY presents easily stated unsolved problems dealing 
with notions ordinarily encountered in undergraduate mathematics. Each problem 
should be accompanied by relevant references (if any are known to the author) and by 


a brief description of known partial or related results. Typescripts should be sent to 
Richard Guy, Department of Mathematics & Statistics, The University of Calgary, 
Alberta, Canada T2N IN4. 


Wanted: A Bad Matrix 


Gary H. Meisters 


1. THE PROBLEM WITH BAD MATRICES. For vectors x in C”, let diag(x) 
denote the diagonal matrix whose diagonal entries are the components of the 
vector x. A given n Xn complex matrix A serves as the kernel matrix for the 
matrix-valued bilinear function @(A)(x, y) = 3[diag(Ax)][diag( Ay)]A of the two 
vector variables x, y. Here’s the question: Is there an n Xn matrix A satisfying 
both of the following conditions? 


Cond 1. The matrix @(A)(x, x) is nilpotent for all x in C”. (The matrix A is 
admissible.) 


Cond 2. There are distinct vectors x and y in C” such that @CA)(x, yx — y) = 
(x — y). (The matrix A is odd.) 


Call a square matrix A satisfying both of these conditions a bad matrix. 


2. REMARKS ON THE GOOD, THE BAD, AND THE UGLY 


2.1. Ott-Heinrich Keller’s (1939) Jacobian Conjecture [1, 4, 8, 9] states that: If 
det[ F’(x)] = 1 for a polynomial mapping F, then F is bijective with polynomial 
inverse. It suffices [2] to prove injectivity. It even suffices [3] to prove injectivity for 
the special “cubic-linear” maps F(A)\(x) := x — H(A x) =x — [diag( Ax)]"Ax. If 
the mapping x > F(A)(x) := x — H(A)(x) is injective, call matrix A good. We 
proved in [10, §3.3 page 118] that A is good if and only if Cond 2 is false: I.e., A is 
good if and only if it is not odd. Cond 2 is certainly false if @CA)(x, y) is nilpotent 
for all x, y (in which case we say the matrix A is beautiful); for then all eigenvalues 
of @(A\x, y) must be zero for all x, y. Cond 1 (admissibility) is equivalent to 
det[ F’(A)(x)] = 1, which is necessary for F(A) to be injective. See [10, Lemma 
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1(c) page 112 and Eq. (2.2) page 110]. Thus every beautiful matrix is good, and 
every good matrix is admissible. The question (Keller’s rephrased): Are there any 
admissible matrices that are odd (hence bad)? 


2.2. The matrix-valued bilinear function @(A)(x, y) has the following properties: 


(a) BAM x, y) = BAN y, x) for all vectors x, y; 

(b) GLAM x, y)z = B(A\M«, z)y for all vectors x, y, z; 

(c) @(A\x, x) is the Jacobian matrix H(A) (x) of the cubic-homogeneous 
mapping H(A)(x) := [diag(Ax)}l, where 1 denotes the column 
[1,1,...,1]’. 


2.3. Here is a 15 X 15 matrix 


0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 4 2 —-2 -2 -2 0 0 2 0 0 2 0 
0 0 2 0 2 0 -1 0 0 -1 0 1 1 0 0 
0 0 -2 4 0 0 0 -2 2 2 2 0 0 0 -—2 
—2 0 -2 4 0 0 0 -2 2 2 2 0 0 0 -2 
0 -2 -2 4 0 0 0 -2 2 2 2 0 0 0 -2 
A=} —-2 0 2 0 2 0 -!1 0 0 —-1 0 1 1 0 0 
—2 0 0 4 2 —-2 -2 —-2 0 0 2 0 0 2 0 
0 —-2 0 4 2 -2 -2 —-2 0 0 2 0 0 2 0 
—2 0 -2 0 -—2 0 1 0 0 1 0 -1 -!l 0 0 
0 -2 2 —4 0 0 0 2 -2 -2 —2 0 0 0 2 
0 -2 0 -4 -2 2 2 2 0 0 —-2 0 0 -2 0 
-2 -2 -2 4 0 0 0 -2 2 2 2 0 0 0 —-2 
—2 —-2 0 4 2 —-2 -2 —-2 0 0 2 0 0 2 0 


which satisfies Cond 1 but not Cond 2. In fact it is good but ugly (not beautiful). It 
has rank 5, nilpotent index 2, and @(A)x, x)? = 0. This example is a slight 
modification of the example given for another purpose on page 39 of [4]. It is easy 
to check that @(A\e,,e,) is not nilpotent (so A is ugly); and that F(A) is 
injective (so that A is good) [14]. It is harder to show that @(A)(x, x) has 
nilpotence-index 5 for all x in C'°: It suffices to show that the 13 < 13 lower-right 
block has nilpotence-index 4. This was checked by computer [12]. 


2.4. All 2 X 2 admissible matrices A can be written as dyads 


for some complex numbers a and b. Furthermore, every such A is beautiful! 


2.5. The cubic-similarity equivalence relation A ~D: Call matrices A and D 
cubic-similar if there is an invertible matrix P such that [diag( APu)P1 = 
P{diag( Du)}°1 Vu € C”. All 2 X 2 admissible matrices are cubic-similar to the one 
representative J(1.2) = k i]: All 3 X 3 admissible matrices are cubic-similar to 


one of the two representatives 


0 1 0 0 1 0 
J(1.2) = ‘ 0 j or J(2.3) = ‘ 0 1 
0 


0 0 O 0 0 
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2.6. We gave six cubic-similarity representatives for 4 <X 4 admissible matrices 
in [11]. 


0 1 0 0 0 1 0 0 
10 0 0 0 10 0 1 0 
J12)=|4 9 9 g H(23)=|4 9 9 9 
0 0 0 0 000 0 
00 1 0 
0011 
N(23)=|9 9 9 4/- 
0 0 0 0 
010 0 0 1 0 0 
{0 00.0 10 0 1 0 
W222 =|, 9 9 4 J34)=|5 9 9 4 
0 0 0 0 00 0 0 
011 0 
00 1 0 
N34)=|5 9 9 1: 
0 0 0 0 


They are admissible and mutually inequivalent with respect to cubic-similarity. 
Integers in their names denote rank of A and nilpotence index of @(A)x, x); 
both are cubic-similarity invariants [11, 13]. While these and the other representa- 
tives shown so far are nilpotent, not every admissible matrix A need itself be 
nilpotent (see the dyads in 2.4); and not every nilpotent matrix is admissible. 
However, it follows from a result of Druzkowski [5] that every cubic-similarity 
equivalence class contains a nilpotent matrix. But the ugly 15-dimensional example 
in Remark 2.3 shows that not every admissible matrix A is cubic-similar to a 
triangular matrix T, because the matrix @(T)(x, y) determined by a triangular 
matrix T is itself triangular, hence nilpotent for all x and y (so both J and A 
would be beautiful). Admissibility, goodness, and beauty are all cubic-similarity 
invariants [13]. A computer-check by Engelbert Hubbers [7] verified that all 4 x 4 
admissible matrices are cubic-similar to one of the above six representatives. 


2.7. Here are four equivalent formulations of the cubic-similarity equivalence 
. cs . . . . : 
relation A ~ D: There is an invertible matrix P such that for all vectors u,v in C” 


(a) [diag( APu)}1 = P[diag( Du)}*1 

(b) [diag( APu) AP = P[diag( Du)}?D 
(c) P-'[HCAY(Pu)|P = H(DY(u) 

(d) P-'[ACA\ Pu, Pv) |P = @(D\u, v) 


Note that [diag( Ax)}?1 = [diag( Ax)]"4x. Differentiation of (a) gets (b); and multi- 
plication of (b) on the right by the vector u retrieves (a). 


2.8. For the matrix A of 2.3 the characteristic polynomial of @(A)(x, y) is (from 
[16]) 


det[t! — @(A)(x, y)] =t + 576(x,y, — x2y,)'t?. 


It is easily seen from this and the Cayley-Hamilton Theorem that @(A)(x, x) is 
nilpotent for all vectors x, and that @(A)(x, y) is not nilpotent for some distinct x 
and y. That is, the matrix A of Remark 2.3 is admissible but ugly. This raises more 
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questions: Which coefficients of the characteristic polynomial of @CA)(x, y) can 
be different from zero for admissible matrices A? (We know, for example, that if 
A is admissible, then det A and det @(A)(x, y) must both be zero.) What other 
examples of admissible-but-ugly matrices can be found in dimensions n > 5? What 
is the smallest dimension containing an ugly matrix? 


2.9. Finally, the existence of a real bad matrix A would provide a counterexample 
to the 1960 Markus-Yamabe Conjecture on global asymptotic stability described in 
[9, 10, 15]. Around 1960, Lawrence Markus and Hidehiko Yamabe conjectured 
that every rest point x, of a nonlinear, class @', n-dimensional system of 
differential equations dx /dt = V(x) is globally asymptotically stable if all the 
eigenvalues of the Jacobian matrix V(x) have strictly negative real parts at every 
point x in R”. If there is an n Xn real bad matrix A, then H(A)'(x)” = 0 (4 is 
admissible) and the mapping F(A)(x):=x — H(A\x) is not injective (A is not 
real good). Thus F(A)(x,) = FCA)(x,) for two distinct points x, and x, in R”, so 
that both x, and x, are rest points of the system 


“ = V(x):= —x + H(A)(x) +x, — H(A)(x,) 

because both V(x,) = 0 and V(x.) = 0. Now two distinct rest points cannot both 
be globally asymptotically stable. (A rest point is globally asymptotically stable only 
if all solutions tend to it as “time” ¢ tends to infinity.) However, H(A)(x)” = 0 
implies that the Jacobian matrix V’(x) = —I+ H(A)(x) has -—1 for all its 
eigenvalues, so we are in violation of the Markus-Yamabe Conjecture. 


3. SUMMARY. We have introduced above the following four classes of complex 
Square matrices A: 


1. A is beautiful means @(A)(x, y) is nilpotent for all x, y. 
2. A is not odd means there are no distinct vectors x, y satisfying 


B(A)(x,y)(x —y) = (4). 


3. A is good means the map x > F(A)(x) is injective. 
4. A is admissible means @(A)(x, x) is nilpotent for all x. 


We know that (1) = (2) @ (3) = (4). It is also known that (2) does not imply (1). 
The open question is this: Does (4) imply (2)? This is Keller’s Question rephrased. 


{beautiful} < {not odd} = {good} ¢ {admissible}. 


{bad} = {odd} M {admissible} . Is {bad} the empty set? 


4. ANOTHER QUESTION. Is there a beautiful matrix A that is not cubic-similar 
to a triangular (not CST)? It follows from 2.4—2.6 that all 2 x 2,3 x 3, and 4 x 4 
admissible matrices are CST. But the 15 x 15 example in 2.3 is a good matrix that 
is ugly (hence not CST). 

See the references for-‘further details and many related questions. A screenplay 
written by Luciano Vincenzoni and Sergio Leone inspired our terminology. The 
author is grateful to the MonTHLY editors and referees for good suggestions that 
improved the exposition; and to Michael Neubauer who helped him improve it 
even further. 
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PROBLEMS 


10459. Proposed by David Beckwith, Sag Harbor, NY. 


A game is played with n disks (n > 3), each having a black face and a red face. Initially, 
the n disks are arranged in a circle showing a random pattern of black and red faces. A move 
consists of taking away a black disk (i.e., one with its black face exposed) and inverting its 
neighbors (if any). The resulting gap is not closed up, so the remaining disks do not acquire 
new neighbors. The goal is to remove all the disks. For which initial patterns is this possible? 
10460. Proposed by Torleiv Klgve, University of Bergen, Bergen, Norway. 

Let (bj, b2, ..., b2n) be a permutation of (1, 2, ..., 2”) such that 

( lb2 — by|, |b3 — b2|,..., ban — ban -1| ) is a permutation of (1, 2,...,2n — 1). 
Show that 


{b2, b4,..., ban} = {1, 2,..., n} ifand only if b} = b2, +7. 


1995] PROBLEMS AND SOLUTIONS 553 


10461. Proposed by Stephen J. Hershkorn, Rutgers University, New Brunswick, NJ. 


A random variable X is said to be symmetric about a real number c if X —c andc — X 
have the same distribution. Show that X is symmetric about c if and only if 


oO 
| Pr (|X —u| <a)du=a 
c 
for alla > 0. 


10462. Proposed by Igor Rivin, Melbourne University, Parkville, Victoria, Australia. 

Let A and A’ be nondegenerate simplices in E”, with (n — 1)-dimensional faces F; and 
F; respectively (i = 0,..., 7). Let aj; be the dihedral angle between F; and Fj, and let a; 
the dihedral angle between F; and F’ (i # j). Prove that if «jj > @;, for alli and j with 
0 <i < j <n, then A and A’ are similar. 


10463. Proposed by F- S. Cater, Portland State University, Portland, OR. 


Let Fo be a subfield of the field F;. Let X be a (possibly infinite) set and let S be a subset 
of the vector space Fs of all functions from X to Fo. If S is linearly dependent over Fo, 


then S is also linearly dependent over F as a subset of F x . Prove or disprove the converse 
statement: if S is linearly dependent over F,, then S must be linearly dependent over Fo. 


10464. Proposed by Hillel Gauchman, Eastern Illinois University, Charleston, IL, and Lee A. 
Rubel, University of Illinois, Urbana, IL. 


Let z = e*”, and let W,, be the n by n matrix whose (i, j) entry (for 0 <i, j <n)is 
gitsz 
Ox! dys 
Evaluate det W,. 


10465. Proposed by Paul K. Stockmeyer, College of William and Mary, Williamsburg, VA. 


As the Minister of Finance of a newly independent country, it is your job to design a 
new currency: a sequence d, < dz < d3 < ... of positive integers, with dj = 1 to be the 
denominations of various coins and bills. Although you are authorized to create an infinite 
number of denominations, the legislature has passed some laws restricting your choices. 
Rulel: There must be a bound b on the number of items needed for any payment. 

Rule 2: the “denomination density”, limk-+o9 k [dx must be zero. 

Rule 3: repeatedly choosing the largest denomination less than or equal to the amount remain- 
ing to be paid (the greedy algorithm) always leads to the use of the minimal number of items to 
pay any amount. 

Can you design a currency meeting these rules? 


SOLUTIONS 


Early Returns in a Tied Election 


10248 [1992, 781]. Proposed by Michael B. Handelsman, Erasmus Hall High School, Brook- 
lyn, NY. 


Candidates Smith and Jones are the only two contestants in an election that will be 
deadlocked when all the votes are counted—each will receive 2n of the 4n votes cast. The 
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ballot count is carried out with successive random selections from a single container. After 
exactly 2n votes are tallied, Smith has S votes and Jones has J votes. What is the expected 
value of |S — J|? 


Solution I by Victor Hernandez and Ricardo Vélez, Universidad Nacional de Educacion a 


Distancia, Madrid, Spain The answer is ny° (any More generally, suppose that each 
candidate will receive N of 2N votes cast and that when exactly k votes are tallied, Smith 
has S,; votes and Jones has J; votes. Let X,; = |S, — Jz|. When X; = J, the leader has 
(k + j)/2 votes, and the trailer has (k — j)/2 votes. Among the remaining votes, there are 
(2N —k + j)/2 for the trailer and (2N — k — j)/2 for the leader. 

This allows us to compute conditional expectations. Given X,_1 = J, the value of X; 


must be 7 +1 or 7 —1. If j > 0, then 


N-(kK-1+))/2 N—(k-1-—9)/2 
Xt | Xp-1 = = 1) ————————————“— — 2) — 
E(Xx | Xk-1 = J) = G +1) IN=kLT TY ) IN DRE 
_. 2N-k 
JON —k+1 


while E(X, | Xx-1 = 0) = 1. We can also compute conditional expectation in the other 
direction. Since the sequence of votes is random, Pr(X, = j) = Pr(X2n-x~ = J), and 
Pr(Xx-1 =i | Xk = J) = Pr(Xon-z41 =1 | X2N-z = J). Hence 
k-1 
E(Xx-1 | Xk = J) = E(X2n—x41 | X2n-k = J) =I 
_ _ _ on: . _ (N\22N\7! . . 
Now let py = Pr(Xx% = 0), so px = Oif k is odd, and py = (x2) () if k is even. 
Conditioning on Xx for the computation of E(X;_1), we have 


k-1 k 

; , k—1 , ; 
yf Pr(Xe-1 = j) = E(Xe-1) = Pe + =; yp Pr(X = J) 
j=l j=1 


k-—1 
= pat > E(Xx). 


Next we use this and conditioning on X;,_1 to compute 


E(X4) = peat 2Na=+ y i Pr(Xp-1 = J) 
= _ en r 12 
EET ON — K+ = JETS J 


IN —k k— 
= —— ———_ K(X 
Pk + > ey kt ; te k)]- 


Solving for E(X;), we have E(X;) = Men Pk-1 + Ore *) px. With the formula 
for px, this yields 


_ 2 -1 
SON 2 (72) (* ) k is even 
ED) = k(2N—-k+1)¢ N \2(2N\71 
2N = (4/2) (x4) k is odd. 


To obtain the result stated at the outset, set k = N = 2n. 


Solution II by Richard Holzsager, The American University, Washington, DC. Start with the 
easily checked identity 


val =a IC)-CICTI 
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Then note that 


E(Xi) = at 2H) (‘) (." 7 / (" 
EVO) CaG AC 


which telescopes to give 


N-1 N-1 2N 
ECAR) = 20 (O01) (veya - : / ( k ) 


A straightforward manipulation of binomial coefficients shows this formula to be equivalent 
to the one given in Solution I. Also, whenk = N = 2n as in the original statement, Stirling’s 
formula shows that E(Xx) is asymptotic to ./2n/z . 


Editorial comment. None of the other submitted solutions provided as general a result as 
these, although the proposer did calculate E(Xx) for k even. However, several other solvers 
observed the asymptotic result obtained from Stirling’s formula. Both R. Daniel Hurwitz 
and Robert J. Wagner pointed out the similarity of this problem to Problem 436 in College 
Math. J. [1990, 423; 1991, 444] by the same proposer. That problem involved the binomial 
distribution where this one involved the hypergeometric distribution. Both of the incorrect 
solutions used the wrong distribution, thereby solving that problem instead of the one posed 
here. 


Solved also by D. M. Bloom, M. Bowron, R. J. Chapman (U. K.), D. A. Darling, Z. Franco, N. N. Gurwell & E. D. 
Onstott, V. Hernandez & R. Vélez (Spain), R. D. Hurwitz, I. Kastanas, P. G. Kirmser, K. MclInturff, G. Schillinger, 
E. Schmeichel, F. Schmidt, G. L. Stanek, M. Vowe (Switzerland), R. J. Wagner, H. Widmer (Switzerland), Anchorage 
Math Solutions Group, and the proposer. Two incorrect solutions were received. 


A Mean Limit 
10259 [1992, 873]. Proposed by Jonathan L. King, University of Florida, Gainesville, FL. 


Let (rz) for k € N be defined by ro = 3 and ry41 = rp — 2. Evaluate 


Solution I by Yan Loi Wong, National University of Singapore, Singapore. The value is 
3+v> tvs . Let a be the positive number such that cosha = 3/2. By using induction and the 


identity cosh 2x = 2 cosh’ x — 1, we have r, = 2 cosh (2k a). From the identity sinh 2x = 
2 sinh x cosh x, we have 


TT K-41 sinh(2¥ a) 
rk = (2cosha)(2 cosh 2a)... (2 cosh 2” a) = ——-——— 
k=0 sinh a 


Hence 
1/2* 


(= (2* a) ) 
= lim (| ———— 
K—>0o sinh a 


34/5 
2 e 


By L’H6pital’s Rule, this limit equals e* = 
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Solution IT by Robin John Chapman, University of Exeter, Exeter, U.K. Letting a = (3 + 
/5) /2, we have rg = a+ a1, The recurrence then yields rz, = oe?" + me by induction. 
Since this equals (a2 — 2") / (a:2" — 2"), we have [Mico rh = (a2" — 2") / 


_2K+1 1/2* 


(a —a~'). The 24 th root of this is a (wer , which converges to a as K > ov. 


More generally, if ro > 2 1s a real number, then a similar argument proves that the limit 
is (ro + [73 — 4) /2. 


Solution III by Curtis Cooper, Central Missouri State University, Warrensburg, MO. We use 
the Fibonacci numbers F;, and the Lucas numbers L,,, defined by 
Fnt2 = Fnaa1 + Fh forn > 0; Fo = 0, Fy = 1 
bns2 = Ln4i t+ Ly forn > 0; Lo = 2, L1 = 1. 


With wa = (1+ J/5)/2 and B = (1 — /5)/2, we have the classical formulas F, = (a” — 
B")//5 and Ly = a" +B", and hence Ly Fy = F2n. By induction on k we have ry, = L>x+1 


and then, by induction on K, [Mgco. rk = Fx+1. Therefore, limx- +o 2 [Mico rh =a = 
(3 + /5)/2. 


Editorial comment. Variations of this problem recently appeared elsewhere. Several 
solvers mentioned Problem 1393, Mathematics Magazine [1993, 127] which in our notation 
requests a formula for the product Meo. rx, for general ro > 2. Three solutions are pre- 
sented, but the nice connection to Fibonacci and Lucas numbers in the case rg = 3 is not 
evident. Related references include Problem E3036 of this MONTHLY [1987, 789]. Tareg 
Alnaffouri reported two published solutions to an equivalent problem for r9 > 2 in Problem 
B-698, Fibonacci Quarterly [1992, 369]. 

The product in Solution II leads to an efficient computation of Fyx+1, and can be modified 
to compute other large Fibonacci numbers. See Paul Cull and James L. Holloway, Com- 
puting large Fibonacci numbers quickly, Info. Proc. Letters 32(1989), 143-149 for more 
information. 

Jonathan Borwein observed that the existence of lim(r,)* " guarantees that the more 
complicated expression has the same limit. This allows the result to generalize to other 
recurrences. For example, for r9 = A > 2 and ryaj = rp — 3rz,, we obtain 


K-1. \ 1/35 —~——\ 2 
lim I] r _(Atva'—* A’—4 
K>00 \ 76 : 7 2 


P.-G. Becker and W. Bergweiler, “Transcendancy of local conjugacies in complex dy- 
namics and transcendancy of their values” (submitted) characterizes polynomials r(x) of 
degree d such that there is an algebraic function u(z), behaving like Az as z — oo with 
r(u(z)) = u(z?). This property of r(x) = x* —2 with u(z) = z+z7! was used in Solution 
II. 


Solved correctly by 48 readers and the proposer. Three incorrect solutions were received. 
An Insufficient Condition for Primality 


10268 [1992, 958]. Proposed by Ondrej Such (student), Queens University, Kingston, Ontario, 
Canada. 


Define a sequence (a,) for n € N by 
ag = 3 a; =0 a,=2 


Gnt3 = 4n41+an (n EN). 
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If p is a prime, show that p|ap. 


Solution by Anchorage Math Solutions Group, University of Alaska, Anchorage, AK. Since 
the roots a, B, y of r> =r+1are distinct, the general solution of the recurrence is 
cya” +¢c2B" + c3y"”. Since 


a + B+ y?=(@+B+y) — 2B + By tay) = 0* — 2(-1) =2, 


we see that cy = cz = c3 = 1 fits the initial values, and hence a, = a” + B" + y”. 
Since a, 6 and y satisfy x? =x +1, we have 


ay, =(@ +1)" + (B+) + +)"=>- (ja 


Similarly, an = )-%_9(—D* (Z)a3x. Since p | (2) forl <k < p—1, and (5) = (?) = 1, 
this gives 
Ap = AQ — A3p = AQ — (ap + ap) = —ap mod p 


when p is odd. Since the claim for p = 2 is part of the initial conditions, this implies the 
result. 


Editorial comment. The large majority of solvers obtained the general solution and then 
reduced it to (a+8+y)” modulo p. Some solved the recurrence using generating functions. 
Explicit formulas for a, are already known. Allan Pedersen cites “Girard’s formula” from 
Encyklopadie der Mathematischen Wissenschaften, Leipzig, 1989-1904, I B3b, p451, which 
in this special case gives a, =n) ((i + j — 1)!/(i!7!), taking the sum over 7, j > 0 such 
that 27 -+37 =n. Istvan Nemes obtained the same formula from “Waring’s formula”, citing 
Jordan’s Calculus of Finite Differences (reprinted by Chelsea, 1965). 

Michael W. Vranos noted that the first composite n such that nla, is 5217: this was the 
only example known to the proposer. Using matrix methods, Kurt Foster found that nla, 
also for n = 821 - 1231 - 6971 = 7045248121 and 211 - 3571 - 9661 = 7279379941. For 
these values of n and positive integer k, ax, = ax mod n. 

H.-J. Seiffert observed that the problem is a special case of a problem in Fibonacci 
Quarterly (31(1993)2, 188) by Paul S. Bruckman; the solution has not yet appeared. L. 
Van Hamme noted that a theorem in C. Smyth, “A coloring proof of a generalization of 
Fermat’s little theorem,” this MONTHLY, 93(1986), 469-471, implies the stronger result 
that > aj, @a(n/d) = 0 mod n, where y is the Mébius function. Frank Schmidt cites a 
solution, along with the examples of composite n including those above, by William W. 
Adams and Daniel Shanks in “Strong primality tests that are not sufficient,” Mathematics 
of Computation 39(1982), 255-300. These authors trace consideration of the problem to R. 
Perrin in 1899. 

Michael Stoll proved more generally that if f is a monic polynomial of degree m with 
integer coefficients and roots r1,..., 7m, anda, = in re, then p|(a@p — a1) for every 
prime p. Fermat’s theorem is the case m = 1. 


Solved by 50 readers and the proposer. 


A Characterization of Small Symmetric Groups 


10270 [1992, 958]. Proposed by Marian Deaconescu, University of Timisoara, Timisoara, 
Romania. 


Prove that a finite group G has the property 
No(A)/CcG(A) = Aut(A) 
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for all subgroups H if and only if G is isomorphic to one of the groups S, forn < 3. 


Solution I by National Security Agency Problems Group, Fort Meade, MD. Given a finite 
group X and a prime p, we let |X|, denote the highest power of p that divides the order 
of X. Also Inn(X) denotes the group of inner automorphisms of X and Z(X) denotes the 
center of X. 


Step 1. All Sylow subgroups of G are cyclic of prime order. 


Let P be a Sylow subgroup of G, and suppose that |P|, = p* fore > 2. Bya 
theorem of Gaschutz (see Suzuki, Group Theory, Theorem 8.14), the index of the group of 
inner automorphisms of P in the group of all automorphisms of P is divisible by p. By 
hypothesis, the automorphism group of P is isomorphic to Ng(P)/CgG(P), and so 
_ ING(P)Ip < ING(P)Ip _ |P\p 

ICe(P)lp ~  |Z(P)| IZ(P lp 


contradicting Gaschutz’ theorem. Therefore, e = 1 as claimed. 


|Aut(P)|, 


=|P/Z(P)|, = lInn(P)I, 


Step 2. G does not have any cyclic subgroups of composite order. 


Suppose H is a cyclic subgroup of order m. If m is divisible by two odd primes, 
then |Aut(H)| = ¢(m) = 0 mod 4. By hypothesis, Aut(H) = Ng(H)/Cc(A). Since 
|INcg(H)/CcG(A)| divides |G], it follows that 4 divides the order of |G|, contradicting Step 
1. If m = 2p for an odd prime p, then Cg() > Z implies that |Ng(H)/CG(A)| is odd, 
but |Aut(H)| = ¢(M) = p — 1 is even. 


Step 3. |G| < 6. 


If the Sylow subgroups of a finite group G are cyclic, then either G is cyclic or G is 
metacyclic and is generated by two elements a and b with the defining relations a” = 1, 
b" = 1,b~!ab = a’, where mn = |G, gcd((r —1)n, m) = 1andr” = 1 mod m (see Hall, 
Group Theory, Theorem 9.4.3). If G is cyclic, then Step 2 implies that G has prime order or 
is trivial. In either case Aut(G) = NG(G)/CG(G) reduces to the trivial group, and so G 
must be trivial or cyclic of order 2. If G is not cyclic, then m and n must be prime and |G| 
is a product of two primes. If G is not divisible by any odd prime, then |G| < 2 by Step 1. 
If p is the smallest odd prime dividing |G|, and P is a p-Sylow subgroup, then |G| is also 
divisible by |Aut(P)| = p — 1, which forces p = 3. 

We conclude that |G| € {1, 2, 6}. Since G cannot be cyclic of order 6, the remaining 
possibility for G is S3. It is easy to check that $3 does satisfy the hypothesis, so G must be 
S, for some n < 3, as desired. 


Solution II by F- Schmidt, Arlington, VA. As in Step 1 of Solution I, we conclude that |G| 
is square-free. Also note that the number of automorphisms a cyclic group of order n is the 
number of positive integers less than n that are relatively prime to n, which is even ifn > 2. 

For any prime p dividing |G|, we have a cyclic subgroup H of order p. Since |Aut(H)| = 
p — 1, we conclude from the hypothesis on G that p — 1 divides |G|. Now the order of G 
is confined to the set T of square-free positive integers s such that p|s implies (p — 1)|s for 
each prime p. If Th pi € T, then also Tint pi € T, where px is the largest prime in the 
product. Since 2-3-7-43+1 is not prime, we conclude that T = {1, 2, 2-3, 2-3-7, 2-3-7-43}. 

If |G| € {1, 2, 6}, then G € {$, Sz, S3}, since the cyclic group of order 6 does not satisfy 
the hypothesis. For the two other possibilities for |G|, let p be the largest prime dividing 
|G|, and let P be a p-Sylow subgroup of G. Since the product of the other primes dividing 
|G| is p — 1, Aut(P) is the cyclic group Z of order p — 1. By the hypothesis, we have 
Aut(P) = Z = NG(P)/CG(P), which implies that Ng(P) = G and Cg(P) = P. Hence 
G/P = Z. Therefore G contains an element of order p — 1, which generates a cyclic 
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subgroup H. Now Aut(H) = NG(H)/Cc(A) is impossible, since the left side has even 
order and the right side does not. 


Solved also by S. M. Gagola Jr., D. B. Tyler, and the proposer. 
Minimal Polynomials for Irrational Numbers 


10272 [1992, 958]. Proposed byJ. MarshallAsh and Leonid Krop, DePaul University, Chicago, 
IL. 


Show that 2 + 4/3 is irrational for n = 2, 3, and 4 and find the minimal polynomials 
that these quantities satisfy. 


Solution I by National Security Agency Problems Group, Fort Meade, MD. For any integer 
n>2, Y¥2+4+ V3 isan algebraic integer. Therefore, if it is also a rational number, it must 
be a rational integer. However, /2 + V3 is between 3 and 4 for n = 2 and between 2 and 
3 for larger n. Hence it cannot be rational. 

Now we describe a method for computing the minimal polynomials of the quantities 
/2 + V3, forn > 2. Define sets Ag,..., An—1 as follows: Aj consists of the integers 
2/3* with J tk = imodn, for0 < j,k < n—1. Denote the elements of A; by 
Gj1,.--,@in. ForO <i <n —1, let B; be the Z module with basis elements bj1,..., Din, 
where bj; = 2/ajj. Multiplication by V2 + */3 effects a mapping from B; to Bj 
(subscripts considered modulo n), which can be given by ann x n matrix C;. Each matrix 
C; contains exactly two nonzero elements in each row and column. The nonzero elements 
are all one except for a single 2 and a single 3. The product C = Cy_1,..., Co is the 
matrix for mapping Bo to itself by (1/2 + */3)". Therefore, the minimal polynomial 
for (4/2 + 4/3)" divides the characteristic polynomial p(x) of the matrix C. Hence the 
minimal polynomial for (*/2 + 4/3) divides p(x"). Using this technique, we find the 
following minimal polynomials: 


n= 2: x* —10x* +1 

n=3: x? —15x® — 87x? — 125 

n=4:  x!©— 20x!" — 666x® — 3860x* + 1 

n=5: x — 5x*9 — 140x) — 460x!° + 35x° — 1 


For example, when n = 3 we have Ag = {1, 12,18}, Ay = {2, 3,36}, Ao = {4, 6, 9}, 
and 


Co Ci C2 Cc 
1 0 3 1 0 3 2 0 3 5 18 18 
1 2 O 1 1 O 1 1 O 3. 5 9 
O 1 1 0 1 2 O 1 1 3 6 5 


Solution II and generalization by the proposers. We prove that if n1, a; are positive integers 


such that a, /"1 is not an integer, then )*;_, a; /"i is irrational for all choices of positive 
integers a2,...,@s and nz,...,ns. Let K = Q[x1,..., x5, @1,.-.., @s] be the finite field 
extension of the rationals formed by adjoining x1,..., xs; and @1,...,@s, where x; is the 


positive n;th root of aj, and w; = e?'/nj_ This extension contains all the solutions of 
the polynomial Tia” — aj)(x"i — 1)] and hence is normal. Thus there exists an 
automorphism o : K — K fixing each rational that also satisfies o(x,) 4 x1. Applying 
o to the equation x,/ = a; shows that o(x;) is also a solution to the equation x”/ = aj. 


Hence o (x;) = njxj, where nj = wo! with 0 < kj <n; — 1. Furthermore, k; £0. 
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Suppose r = i=l x; is rational. Applying o yields r = ) | n;x;. Moving all terms to 
the left and taking real parts yields } (1 — cos(arg n;))x; = 0. This forces arg n; = O for 
all 7, which contradicts ki 4 0. 


Solved also by S.-J. Bang (Korea), R. J. Chapman (U. K.), T. P. Dence, H. S. Gunaratne (Brunei), Ignotus (Mozam- 
bique), D. W. Koster, K.-W. Lau (Hong Kong), O. P. Lossers (The Netherlands), C. Rees, A. Tissier (France), T. Zeanah, 
and the GCHQ Problem Solving Group (U. K.) 


Rencontres and Random Binary Operations 


10280 [1993, 76]. Proposed by Donald E. Knuth, Stanford University, Stanford, CA. 


Define a random binary operation x on the set {1,...,} by choosing every value inde- 


pendently, so that each of the n™ possible binary operations is equally likely. 
(a) Prove that the axiom 


(x wx) wx) w(x wx) eX) =x 


holds for 1 < x <n with probability 


n 
3 Pn,k 
2n—k 
t= 
where py.x is the number of permutations of {1, ...,} with k fixed elements. 
(b) Show that the probability in (a) is asymptotic to 5 e"1n!/n*" asn — 00. 


Note: The sum in (a) should start at 0 rather than 1. 


Composite solution by Robin J. Chapman, University of Exeter, Exeter, U. K., and Allan 
Pedersen, S¢borg, Denmark. 

(a) If x satisfies the axiom, define 7: {1,...,n} — {1,...,n} by w(x) =x «x for x in 
{1,...,m}. The axiom shows that z is surjective, and thus z is permutation. The axiom 
may now be written as 7(x) «x = 27 !(x). 

Now let a be a permutation of {1,...,”} with k fixed points. It will suffice to show 
that the number of binary operations satisfying the axiom, and associated in this way to 7, 
isn” —22-®) If % is such a binary operation then the n products x * x are determined by 
7, as are the products 2(x) *« x, of which there are n — k not of the form x * x. The other 
n? — (2n — k) products may be chosen arbitrarily, yielding nn —(2n—k) possibilities for x. 

(b) It is well known that pn,x = (7) Pn—x,o and that ppg =n! )-p_9(—1)"/r!. Then we 


have 
n k : n (x _ 1)" 
) Pn.kx =n! ) ——_—., 
r! 
k=0 r=0 


since the coefficient of x* on the right is 


(— a —k n n (—1)"-* 
> er (i) = (i) 0-1 Gar om 


Thus the orobability in (a) is 
ni! (n—1)’ 
nen d rio 
It is known that }*"_ n”/r! ~ (1/2)e” as n — oo. (See, for example, D. J. Newman, A 
Problem Seminar, Springer (1982), problem 96.) Thus 
Sra Ap OHI Lt 


! ! ° 
~ r! 2 n! 2 
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by Stirling’s formula, and the desired asymptotic formula follows. 


Editorial comment. Solvers used a variety of methods for asymptotic evaluation of the 
sum in (b), of which the simplest is the central limit theorem: if X1,..., X, are independent 
Poisson random variables with mean 1 and standard deviation 1 then S, = X; +---+ Xp 
has a Poisson distribution with mean n and standard deviation /n. Thus 


oT = P(S, <n) = P( (Sp —n)/Vi <0). 


Since the central limit theorem implies that (S, — n)/./n approaches a standard normal 
distribution as n —> ox, this probability approaches 1/2. 

The proposer noted that the probability in part (a) can be expressed in terms of Ramanu- 
jan’s function Q(n). Using the asymptotics for Q(n) discussed in his book Fundamental 
Algorithms, Addison-Wesley (1968), p. 117, he gave an asymptotic series for this probability: 


1 10 1 53 
/2 — 4 [a O(n~?/*) |. 
2en" ( mn 3 r 72n = 135n + OW ) 


Dennis P. Walsh and the proposer showed that in the variation of this problem in which 


2 . . . . “4s 
only the n° +”)/2 commutative binary operations are considered, the probability that the 
axiom is satisfied is asymptotic to (1/2)e"~7/2n! /n2". 
Solved also by D. Callan, J. A. Grzesik, S. C. Kian (Singapore), O. P. Lossers (The Netherlands), A. D. Melas 
(Greece), R. Sprugnoli (Italy), D. P. Walsh, A. N. ’t Woord (The Netherlands), GCHQ Problem Solving Group (U. K.), 


Western Maryland College Problems group, and the proposer. Part (a) only solved by D. Beckwith and J. C. Binz 
(Switzerland). 


Collaborating editors: David F Appleyard, Paul T. Bateman, Bruce C. Berndt, Duane 
M. Broline, Barry W. Brunson, Frank S. Cater, Gulbank D. Chakerian, Underwood Dudley, 
Gerald A. Edgar, Michael A. Filaseta, Ira M. Gessel, Richard A. Gibbs, Jerrold R. Griggs, 
Douglas A. Hensley, John R. Isbell, Mourad E. H. Ismail, Murray Klamkin, Daniel J. Kleitman, 
Frederick W. Luttmann, Frank B. Miles, Richard Pfiefer, Stephen L. Portnoy, J. O. Shallit, John 
Henry Steelman, Kenneth B. Stolarsky, David E. Tepper, Douglas B. Tyler, Daniel Ullman, and 
William E. Watkins. 


Stirling Numbers 


Philippe Flajolet has informed me of the astonishing fact that the 
notations 4 and teh for Stirling numbers recommended in my paper 


“Two Notes on Notation,” Monthly 99 (1992), 403-422, were used 
already by J. Karamata in “Théorémes sur la sommabilité exponentielle 
et d'autres sommabilités s'y rattachant,” Mathematica (Cluj) 9 (1935), 
164-178. 


Donald E. Knuth 

Department of Computer Science 
Stanford University 

Stanford, CA 94305 
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Football Pools—A Game 
for Mathematicians 


Heikki Hamalainen, liro Honkala, Simon Litsyn, 
Patric Ostergard 


1. INTRODUCTION. In a popular game played regularly in many countries one 
tries to guess the outcomes in a number of competitions or matches. By an 
outcome we here mean that the host wins, loses or plays a draw, or anyway require 
that the number of possible outcomes is three (or more generally, some finite 
number). One can find many small newsletters devoted to this game which contain 
plenty of different systems for “winning strategies,” that is, sets of guesses which 
have some nice, relevant properties. 

At the same time another group of people—combinatorialists—are busy trying 
to produce many different types of arrays of numbers with special features. We 
have the feeling that the two groups are not well acquainted with each other’s 
work, although recently some mathematical journals have published papers report- 
ing results obtained by the playing community, and we hope that this paper will 
contribute to increasing the mathematicians’ interest in these problems. The 
problems are natural and mathematically easy to formulate, but are highly non- 
trivial and can be attacked using powerful combinatorial machinery. 

Some interesting combinatorial objects have been independently discovered by 
both of these groups. For example, the ternary Golay code was published in the 
Finnish football pool magazine in 1947 and presented independently in the 
mathematical literature in 1949 (see [34, Chapter 20] and [16]). Of course, they 
were also constructed for different purposes: to be used as a particularly nice 
football pool system by the playing community and as a nice error-correcting code 
by the mathematical community. Needless to say that the authors were unaware of 
each other’s discoveries. 

In this paper we want to describe several old, well-known problems connected 
with football pools, as well as some interesting new combinatorial objects arising in 
this context. We give a rigorous mathematical formulation for each problem and 
give references to the existing literature and in some cases also tables of the best 
known numerical results. 

To formulate our problem more precisely, assume that we have n matches. In 
the classical literature one always speaks of football matches, which explains the 
title of this paper. In each match there are three possible outcomes: 0, 1 and 2. A 
forecast is a vector (or a word) of length n whose components belong to the set 
Z, = {0,1, 2}, the integers modulo 3. A system (of forecasts) of size M consists of 
M such vectors. After the matches have been played an entirely correct forecast is 
said to win the first prize, and more generally, a forecast with z — 1 incorrect 
guesses wins the ith prize. The first prize is usually quite large, and other prizes 
are smaller and decreasing with i. Furthermore, the size of the ith prize depends 
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on the total number of forecasts winning the ith prize. Starting from some i, the 
prizes equal 0 (usually from i = 5). 

The organizers of this game use a fixed percentage of the stakes to pay the 
prizes, so without any insight into the teams playing the matches there is really 
nothing you can do in the long run to get more money than you invested. If you 
have some expertise, however, you might consider some of the outcomes highly 
unlikely and wish to exclude them and only concentrate on the remaining possibili- 
ties. Then you can use a suitable system and try to win some prize by using as little 
money as possible. Of course, you always have a chance of a big win as well. 

We say that the Hamming distance d(a, b) between two words a = a,a,... 4a, 
and b = b,b,:..b, is the number of indices i for which a, + b,. The Hamming 
weight wt(a) is defined to be the number of indices i for which a; # 0. 


2. PROBLEMS 


2.1. The classical football pool problem. The earliest and most natural problem, 
called the football pool problem, is to try to construct a system for m matches which 
guarantees you at least the second prize. More generally, you wish to construct a 
system guaranteeing you at least the (r + 1)st prize. Assuming that you already 
know beforehand (or think that you know) the outcome in some m — n matches 
reduces the problem to finding a similar.system for n matches. 

Mathematically, we wish to find the smallest subset S of Z3 such that for every 
x © Z} there exists a word s © S such that d(x, s) <r, that is, the covering radius 
of S is at most r. This kind of covering radius problem has been widely studied in 
information theory [7, 8, 12, 31]. For r = 0 the solution is trivial: we simply take 
_S = Z3. For radius r = 1 the problem is already open in general. 


Example. Consider the case n = 4. It can be verified that each of the 81 points of 
Z3; is within Hamming distance one from at least one of the nine words 


0000, 0112, 0221, 1022, 1101, 1210, 2011, 2120, 2202. 


Since each word in Z3 has distance at most 1 to exactly 2n + 1 words in Z3 we 
know that when r = 1 we need to have at least 3” /(2n + 1) words in our system. 
This is called the sphere covering lower bound: 


IS| > 3"/(2n + 1). 


Therefore our system for n = 4 is the smallest possible. In general, if n is of the 
form (3’ — 1)/2 then this bound is tight, because of the existence of Hamming 
codes, cf. [30, p. 36]. The words in the Hamming code are exactly the solutions s of 
the equation (all the operations are modulo 3) 


Hs’ = 0, 
where the columns of H form a maximum set of pairwise linearly independent 
vectors in Z%, ie., they represent distinct points of the projective geometry 
PG(hA — 1, 3). 

When r= 1 the smallest possible cardinalities of S are also known for the 
values n = 1, 2, 3 and 5, and are 1, 3, 5 and 27, but are unknown for all other 
values of n. 

For the general r, we have the following general sphere covering bound: 


[S| > 3" » ("Jae 
i-o \? 


580 FOOTBALL POOLS [August-September 


Example. When 1<r<n, the sphere covering bound is attained only when 
n = 11 and r = 2 [49], [34]. Such a system may be constructed in the following way. 
Take all the words (x,, x5,...,%,), x; © Z,, i=1,...,11 such that x,,..., x, 
take on all possible combinations of values 0,1,2 and the remaining five compo- 
nents are computed from the following system of equations (all the operations 
modulo 3): 


Xy =X, + X,+2x,+2x5+ Xe; 
Xg=X,+ xX, + X4 t+ 2x5 + 2X¢, 
Xyg =X, +2x,+ X34 X5 + 2X6, 
X19 =X, + 2x, 4+2x%,4+ X44 X6> 


Xyp =X, t Xp +2x,4+2x%,4+ Xs 


The constructed set represents the ternary Golay code, see [34] and [9] for other 
definitions and many connections with other areas of mathematics: combinatorics, 
sphere packings and groups. 


In Table 1 the best currently known lower and upper bounds on the smallest 
possible cardinality K,(n, r) of a system S C Z3 with covering radius r are shown. 

For constructions of the codes, refinements on the sphere-covering bound for 
the classical football pool problem, and some earlier results, see, e.g., [1, 2, 3, 4, 11, 
13, 16, 23, 24, 25, 26, 27, 31, 36, 37, 39, 43, 50, 51, 52, 54, 56]. 


TABLE 1. Bounds for K,(n, r), the minimum cardinality 
of a set S ¢ Z3 with covering radius r. - 


n\r 1 2 3 
1 1 
2 3 1 
3 5 3 1 
4 9 3 3 
5 27 8 3 
6 63-73 12-17 6 
7 150-186 26-34 7-12 
8 393-486 52-81 13-27 
9 1048-1356 128-219 25-54 
10 2818-3645 323-558 57-108 
11 7767-9477 729 115-243 
12 21395-27702 1919-2187 282-729 
13 59049 5062-6561 609-1215 


2.2. The binary covering radius problem. As mentioned in the introduction, it is 
sometimes natural to feel confident that some outcomes do not occur and to 
exclude them. If we exclude one of the outcomes in each of the n matches we are 
left with the binary covering radius problem: we wish to find a set S C Z} such 
that every x in Z} is within Hamming distance r from at least one word s € S. 
Here Z, is the set of integers modulo 2. In the same way as in the ternary case, we 
immediately obtain the binary sphere covering bound: 


Isp >2"7/y ("). 
i-o \? 


In fact, if r=1 and n is of the form n = 2" —1, or r=3 and n = 23, or 
n = 2r-+ 1, or n =r, then it can be shown that this bound is tight, and these are 
the only cases, see [49], [34]. 
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Example. When n = 4 and r=1 then each point in Z} is within Hamming 
distance one from at least one of the words 

0000, 1000, 0111, 1111. 
So, the fact that we decided to exclude one of the possibilities in each match 
allowed us to decrease the number of words in the system from 9 to 4. 


Example [8]. Consider the case n = 11 and r = 1. The well-known Steiner system 
S(4, 5, 11) is a collection of 66 5-element subsets called blocks of an 11-element set 
A and has the property that every 4-element subset of A is contained in exactly 
one block. The blocks can be viewed as binary words of length 11, each having 
exactly five ones, that is, weight five. The 66 blocks and their complements in A 
viewed as binary words (of weight 5 and 6) form a system such that every binary 
word of weight from 4 to 7 is within Hamming distance one from exactly one of 
these 132 words. What remains is to cover the words of weight from 0 to 2 and 
their complements in an efficient way. This can be done as follows. Partition the 
set A to two parts, one with cardinality five and the other with cardinality six. 
Take all the 2-elements subsets of these two parts and all the 1-element subsets of 
the part with five elements. These 10 + 15 + 5 = 30 subsets and their comple- 
ments will do. The resulting system contains 192 words and has covering radius 
one. 


In the first column of Table 2 the best currently known lower and upper bounds 
on the smallest possible cardinality K(n,1,1) of a system S CZ with covering 
radius at most 1 are given. 

TABLE 2. Bounds for K(n, 1, ), the minimum cardinality of a set § C Z3 


such that every x € Z3 is within Hamming distance 1 
from at least elements of S. 


n/ 1 2 3 4 

1 1 2 

2 2 3 4 

3 2 4 6 8 

4 4 8 11 14 

5 7 12 16 22 

6 12 19-20 30-32 38-40 

7 16 32 48 64 

8 32 58-64 90-94 114-125 

9 55-62 104-112 154-160 206-220 
10 105-120 187-220 289-320 374-416 
11 176-192 342-380 512 684-704 
12 342-380 631-752 972-1024 1262-1376 
13 598-736 1172-1280 1756-1984 2342-2560 
14 1171-1408 2186-2560 3356-3776 4370-4992 
15 2048 4096 6144 8192 
16 4096 7711-8192 11809-12288 15422-16384 


For constructions of binary covering codes and methods to derive lower bounds, 
see [7, 8, 10, 12, 13, 17, 18, 19, 20, 21, 22, 28, 35, 38, 40, 41, 47, 51, 54, 57, 58]. 


2.3. The mixed case. If we decide to exclude one of the three outcomes only in 
some, say b, of the n matches, we have the so-called mixed case. The sphere 
covering lower bound is then 


S| = 203" ¥ y (7)(" Pee 


Jat 
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Example. Consider again the case n = 4. Each word in Z3Z3 is within Hamming 
distance one from one of the six words (cf. [16, 41]): 


0011, 0200, 1000, 1211, 2101, 2110. 


So, by excluding one of the three possible outcomes in two of the matches we can 
decrease the number of words in the system from 9 to 6. 


In Table 3 the best currently known upper bounds on the smallest possible 
cardinality K3 (1,, 1; 1) of a system S C Z3!Z52 with covering radius 1 are given. 
Lower bounds are not shown. However, the known exact values are marked by a 
period. For constructions of (these and other) mixed covering codes and lower 
bounds see [10, 16, 29, 33, 38, 41, 52, 53]. 


TABLE 3. Bounds for K3 ,(n,, 12; 1), the minimum cardinality 
of aset § C Z3'!Z52 with covering radius 1. 


n/n, 1 2 3 4 5 6 7 8 9 0 ll 122 


1 2. 3. 6. 8. 16. 24. 48 84 160 284 548 1024 

2 4. 6. 12. 20. 36 64 126 234 419 768 1504 

3 9. 16. 24. 48 92 176 320 576 1120 2080 

4 18. 36 72 132 240 432 864 1296 2592 

5 54 96 168 324 639 1206 1944 3888 

6 132 252 468 864 1656 2916 5832 
7 
8 


10 7047 =13122 25192 
11 = 18894 37788 
12 = 52488 


More generally, we can assume that in each competition there is a different 
number of possible outcomes. For example, in horse races when trying to pick out 
the winner in each heat you may have a different number of likely winners that you 
wish to concentrate on. 


2.4. Multiple coverings. Suppose a group of yw players join their efforts and wish 
to find a system which guarantees them at least mw prizes (one for each!) each of 
which is at least the (7 + I)st prize. An evident strategy is that each player 
individually uses a usual football pool system that guarantees at least the (r + I)st 
prize. However, this is not always an efficient strategy. 


Example. Let n = 4 and r = 1. As 2-fold and 3-fold coverings of the space Z; we 
could use 2-fold and 3-fold repetitions of the system of four vectors described in 
the first example of Section 2.2, thus obtaining systems with eight and twelve 
vectors. Nevertheless, it is possible to do better. The seven words 


0001, 0010, 0011, 1100, 1100, 0111, 1011, 
and the eleven words 


0001, 0010, 0100, 0011, 0101, 1001, 1010, 1100, 0111, 1101, 1110, 


provide 2- and 3-fold coverings, respectively. Notice that we have here used the 
same word more than once, which sometimes does help, cf. [14]. 


Example. Let n = 11 and r= 1. We want to find a 3-fold covering of Z}'. We 
construct the system by taking all vectors (x,, x,,...,%1,), x; © Z,, i= 1,..., 11, 
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such that 
X19 =X, FX_ Xz +X + Xs, 
Xyy =X, TX. Xz +X +X, 


(where the summation is taken modulo 2). It is easy to check that the system 
consists of 2” vectors, which is the best possible since it attains the sphere-covering 
bound: 


IS| = 29 =3 x 24/12. 


In Table 2 we give the best currently known lower and upper bounds on the 
smallest cardinality K(n,1, w) of a system S C Z) that is a y-fold covering when 
r = 1. Notice that in Table 2 we have assumed that no word is permitted to appear 
in the system more than once. For constructions and lower bounds, see [5, 6, 14, 
55]. 


2.5. Multiple coverings of the farthest-off points. Since the first prize is usually 
much bigger than the second one, and the second one is bigger than the third one, 
a natural goal is to try to guarantee one big prize or several small prizes. 


Example. If we use the system consisting of the six words 

1111, 1111, 1000, 0100, 0010, 0001, 
in Z; we will always get at least one entirely correct forecast or at least two 
forecasts with one incorrect entry, as can easily be checked. This means that every 
word of Z; not belonging to the system is covered at least twice. In comparison, we 
need four words to provide a 1-fold covering and seven words for a 2-fold covering. 


For other constructions see [6, 15]. 


2.6. Weighted coverings with decreasing weights. A general statement of the 
problem is that the player chooses some decreasing sequence of rational numbers 
m =(m),™M,,...,m,). For every word x € Z3'Z5?,n, + n, =n, define the vector 
a(x) = (a)(x),...,a,(x)), where a(x) stands for the number of words in the 
system being at Hamming distance i from x. A system S is called a weighted 
m-covering if for every x © Z3'!Z5? the inequality 


3 m,a,(x) > 1 


i=0 


holds. 

Notice that the systems in the examples where n = 4 are all special cases of this 
problem for the nonzero weights my) = m, = 1,m) =m, =1/2,m) =m, = 173, 
and m, = 1,m, = 1/2. 


Example. Consider Zj, and choose m, =m, =1/2,m, =1/4. The following 
system of four words 

0000, 1100, 0011, 1111, 
will guarantee one first prize and two third prizes, or two second prizes, or four 
third prizes. 


For constructions and bounds, see [6]. 


2.7. Other related problems. A natural generalization of all the problems men- 
tioned above is to require that only some part A of the whole space Z} is to be 
taken care of. One obvious choice for A would be that A itself is a Hamming 
sphere B,(x) for some integer R and x € Z3. This sphere consists of the words 
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within Hamming distance R from the word x. This corresponds to the case where 
a player is more or less convinced that the outcome x will occur, but accepts the 
possibility that he may be mistaken in some but not more than R matches. Also, if 
the player guesses that the number of home wins is at least n — R for some integer 
R, it is natural to take A as the set of words in which the number of 0’s is at least 
n — R,i.e., Bp(00...0), where 00...0 denotes the all-zero word. 
Another possible choice for A would be the complement 
Z3 \ Br(x) 

of a Hamming sphere. This could be useful if a player has seen in a newspaper that 
the outcome x is the most probable (according to the opinions of experts), and he 
is personally not at all convinced by their estimate and wishes to cover the area 
which is as far as possible from the generally accepted guess. In this way he hopes 
that the number of other people using such forecasts will be small and if he wins a 
prize he will not have to share it with too many people. There is also another 
possible motivation for such a choice. Assume that x is the all-two word 22...2. 
Then Z3 \ B,(x) is exactly the set of words in which the number of 2’s is smaller 
than n — R. So, if we think that the number of visitor wins is smaller than n — R, 
this seems a reasonable choice. Alternatively, we think that the outcome in each 
match will be 0 or 1 but we allow for the possibility that we are wrong about some 
matches. 


Example. The best way of covering a Hamming sphere B,(000...0) in Z5, n = 3, 
with Hamming spheres of radius 1 uses n—1 spheres: choose the words 
000...0,111000...0 of length n and all the words of weight 1 in Z> except 
1000...0,0100...0,0010...0. Clearly, the m — 1 spheres of radius 1 centered at 
these points cover B,(000...0). 

To see that this is actually the smallest possible number of such spheres, assume 
that there are exactly n —i words of weight 1 in our system. Without loss of 
generality we may assume that these n —i words contain 1’s in the last n —i 
coordinates. Then all the words of weight two that have at least one 1 among the 
last n — i coordinates are already covered. The number of other words that is 


required to cover the remaining (:] words of weight 2 is at least (:] /3. Hence the 
total number of words required is at least 


ni (i)faon-2 


when i < 2 or i > 5. A direct verification shows that the result is correct also when 
i=3ori=4. 


Example. Consider how B,(000...0) € Z3 can be covered with Hamming spheres 
of radius 1. Clearly, to cover all the words of weight 3, we must have at least 
(n > 6) 


ac — 2) =n’/6+ O(n) 


spheres of radius 1. 

To get an upper bound we use the following construction. We first cover all the 
words of weight 3 by words of weight 2 by picking to our system all the words of 
weight 2 in which both the 1’s are in even-numbered coordinates or both are in 
odd-numbered coordinates. Since every word of weight 3 has two 1’s in either 
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odd-numbered or even-numbered coordinates, they are all covered by these words 
of weight 2. Graph theoretically, we are taking the complement of a triangle-free 
graph on n vertices (edges correspond to words of weight two). The words of 
weight 1 are clearly already covered, and to cover all the words of weight 0 and 2, 
it is sufficient to take the words of weight 1 where the single 1 is in an 
even-numbered coordinate. All in all, we get the upper bound n*/4 + O(n). 


It is possible to generalize the weighted covering problem even further. Suppose 
that for each x € Z3!Z52 we have weights m,(x) depending on x. We now want to 
find a set S C Z3'Z52 such that 


O(x) = YL ai(x)m(x) > 1 


for all x € Z3'Z32. A special case of this is to assume that 
m,(x) = m(x)m, 


in which case we simply require different densities at different points. From the 
player’s point of view this means that he wishes to take into account some possible 
outcomes more than others. 
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In mathematics, as in any scientific research, we find two tendencics 
_ present. On the onc hand, the tendency toward abstraction secks to 
crystallize the logical rclations inherent in the maze of materials that 
is being studied, and to correlate the matcrial in a systematic and 
orderly manner. On the other hand, the tendency toward intuitive 


understanding fosters a more immediate grasp of the objects one 
studies, a live rapport with them, so to speak, which stresses the 
concrete meaning of their relations. 


— Hilbert 
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Fresh Breezes in the Philosophy 
of Mathematics’ 


Reuben Hersh 


Since Pythagoras, philosophy of mathematics tried to account for mathematical existence and 
the nature of mathematical objects. 

Numbers, circles, n-dimensional manifolds, all are different from everything else we think 
about. They’re neither physical nor mental. Not mental, because the Pythagorean theorem or any 
other well-established mathematical fact is independent of what you or I think. Whether we 
know it and believe it or don’t know it and don’t believe it, the Pythagorean theorem is still true. 
Yet it’s not physical either! Plato and Aristotle explained that the triangles and circles of the 
geometer are not physical triangles or circles, but something “ideal.” 

Spiritual, empirical, psychological, formalist, and logicist explanations have been offered. 
None give a credible account of what we do when we do mathematics. Presently some authors 
are constructing a humanist answer. 


An Israeli mathematics education researcher, Anna Sfard, recently found an 
interesting insight. In learning a mathematical concept, children first learn it as 
algorithm—procedure, or method. Later, the algorithm is transformed into an 
object. She calls this “reification.” It’s difficult to achieve, often needing help from 
teacher. This story is close to theories of the Russian psychologist, Lev Vygotsky. 

For example, subtraction is an algorithm. It isn’t hard. It reifies into negative 
numbers—very hard! 

Which mathematical entities are frozen algorithms? What’s the interaction 
betwéen doing and being, algorithm and entity? This is a question in philosophy of 
mathematics based on mathematical practise, on seeing mathematics as a human 
activity. It’s not a foundationist question. 


FOUNDATIONS LOST. In books on philosophy of mathematics (Korner, or 
Benacerraf & Putnam) you read of the leading problem, “foundations.” How can 
we establish mathematical knowledge as certain, indubitable, free of any possible 
doubt? Three historically important solutions to this problem were logicism 
(Platonism), formalism, intuitionism. All were unsuccessful. For logicism and 
formalism, no major new idea has come up in over half a century. Intuitionism and 
its daughter constructivism did strive to carry out the program of Brouwer 
streamlined by Bishop. But their goal of remaking mathematics constructively is 
more remote today than 60 or 70 years ago. 

The surviving scrap of foundationalism was named “neo-Fregeanism” by Philip 
Kitcher. This notion still dominates the philosophy of mathematics. It says: 


'This article originated as an invited talk to the 1993 annual joint meeting of the sections on 
mathematics and on philosophy of the New York Academy of Science. Thanks to Prof. Bruce Chandler 
and Prof. Harold Edwards for the invitation to the New York Academy. Double thanks to Prof. Hao 
Wang of Rockefeller University, whose hospitality in the spring of 1993 was generous and inspiring. 
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“Philosophical thinking about mathematics need not concern itself with anything 
but sets, and set theory’s twin sister, logic.” But most researchers, users, teachers, 
historians of mathematics aren’t primarily interested in sets. So philosophers of 
mathematics ignore mathematics and mathematicians, and mathematicians find 
nothing of interest in philosophy of mathematics. 

Deplorable! The principal problem in philosophy of mathematics left in paraly- 
sis for over half a century! Mathematicians and philosophers of mathematics 
ignorant of each other’s existence! A Harvard philosopher, Hilary Putnam, pub- 
lished a foundationalist paper titled “Mathematics Without Foundations.” Is 
philosophy of mathematics pointless and unnecessary? Or is it time for a fresh 
start? 


PHIL /M AND PHIL / SCI. One weird phenomenon of modern philosophy is 
that philosophy of science and philosophy of mathematics are almost disjoint. 
Authors in philosophy of science rarely refer to philosophy of mathematics, and 
vice versa. An author who writes on both subjects, in any one article sticks to one 
or the other. It’s like baseball and football—play one or the other, but not both at 
the same time. 

I like to compare philosophy of mathematics today to philosophy of science in 
the 30’s and 40’s. That subject was dominated by logical positivists: Rudolf Carnap 
and his friends of the “Wiener Kreis” (Vienna Circle). As a result of taking 
Bertrand Russell and Ludwig Wittgenstein too seriously, they believed they knew 
the correct methodology for scientific work: (1) state the axioms; (2) give corre- 
spondence rules between words and physical observables; (3) derive the theory, as 
Euclid derived geometry, or Mach derived mechanics. 

It was noticed after a while that what logical positivists said had little in 
common with what scientists did or wanted to do. New ideas in philosophy of 
Science came from Karl Popper, Tom Kuhn, Imre Lakatos, Paul Feyerabend. 
These subversives disagreed with each other. But they all thought philosophers of 
science could think about what scientists actually do, not bring presuppositions and 
instructions for scientists to ignore. 

Philosophy of mathematics is overdue for its Popper, Kuhn, Lakatos, and 
Feyerabend. It’s overdue for analysis of what mathematicians actually do, and the 
philosophical issues therein. 

In fact, this turn is taking place. Wittgenstein and Lakatos helped start it. In 
recent years Michael Polanyi, George Polya, Alfred Renyi, Leslie White, Ray 
Wilder, Greg Chaitin, Phil Davis, Paul Ernest, Nick Goodman, Phil Kitcher, 
Penelope Maddy, Michael Resnik, Gian-Carlo Rota, Brian Rotman, Gabriel 
Stolzenberg, Robert Thomas, Tom Tymoczko, Jean Paul van Bendegem, and Hao 
Wang have participated. 

Here are ideas some of these people hold. 

1) Mathematics is human. It’s part of and fits into human culture. (Not Frege’s 
abstract, timeless, tenseless, objective reality.) 

2) Mathematical knowledge is fallible. Like science, mathematics can advance 
by making mistakes and then correcting and recorrecting them. (This “fallibilism” 
is brilliantly argued in Lakatos’ Proofs and Refutations.) 

3) There are different versions of proof or rigor, depending on time, place, and 
other things. The use of computers in proofs is a nontraditional version of rigor. 

4) Empirical evidence, numerical experimentation, probabilistic proof all help 
us decide what to believe in mathematics. Aristotelian logic isn’t necessarily always 
the best way of deciding. 
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5) Mathematical objects are a special variety of social-cultural-historical object. 
We can tell mathematics from literature or religion. Nevertheless, mathematical 
objects are shared ideas, like Moby Dick in literature, or the Immaculate Concep- 
tion in religion. 

How do humanists answer the big question, ‘“What’s the nature of mathematical 
objects?” 

The question seems difficult because of a centuries-old assumption in Western 
philosophy: “In the world there are two kinds of things. What’s not physical is 
mental; what’s not mental is physical.” When Frege proved that mathematics is 
neither physical nor mental, he accounted for it by means of a third kind of entity 
—“abstract objects’—about which he could say nothing except that they’re neither 
physical nor mental. 

Mental is thought, individual consciousness, subjectivity; wishes, fears, percep- 
tions, hopes, desires, private thoughts. 

Matter is what takes up space, has weight, can be studied by scientific instru- 
ments. Mountains, bugs, the stars, gamma rays. 

Is there anything that’s neither mental nor physical? Yes! Sonatas. Poems. 
Churches. Religions. Diplomas. Armies. Wars. Universities. Academies of science! 

Does the New York Academy of Science exist? Undoubtedly. Is it mental? If 
the Secretary and the President of the Academy died of amnesia, the life of the 
Academy would continue. The Academy isn’t just somebody’s thoughts! Even if the 
building were blown up and the trustees moved the Academy to Yonkers, it would 
go on. Its physical and mental embodiments are necessary, but they’re not it. The 
Academy isn’t just the minds and bodies of anyone. Neither is it just the stones of 
its building. 

What is it? It’s a social institution. The mental and physical aren’t sufficient to 
describe the New York Academy of Science. Nor are they sufficient to describe 
most of the things that most concern us. Marriage and divorce, employment, 
shopping, prices and salaries, war and peace, professional sports and television 
shows. All have mental and physical aspects, but they aren’t mental or physical 
entities. They’re social entities. 

There are not two but three basic kinds of things in the world. 

Now, what about mathematical objects—let’s just say numbers. If everything’s 
either mental, physical, or social, then what are numbers? We’ve already seen that 
numbers aren’t mental or physical. By the law of the excluded middle, they must be 
social. But let’s not be peremptory. Let’s consider it a hypothesis. Is mathematics 
social-cultural-historical? 

Certainly it’s historical. The history of mathematics is a developed subject. 
Historians have studied mathematics back to the Babylonians. We don’t know the 
remote origin of mathematics, or the remote origin of writing, speech, religion, or 
the family. That origin was part of the self-creation of the human race. Archeology, 
linguistics, genetics, ethnology tell us a little more. Counting and talking both had 
their human beginnings. 

Mathematics is a social entity. Mathematicians never were isolated hermits. 
Today they’re in academic, government or industrial jobs, paid directly or indirectly 
by the government. 

Srinivasa Ramanujan, the self-taught Indian mathematical genius, worked hard 
to be recognized by the English mathematics establishment. Once he was invited, 
he went to England, at a cost to his family, his religious commitment, and his 
ability to find daily food he could eat. His did so in order to work with mathemati- 
cians who understood what he was doing. 
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In the 16th and 17th centuries, Fermat, Huygens, Leibnitz were assiduous letter 
writers, constantly trading ideas with colleagues in other cities and other countries. 

Today a new result is certified as part of mathematics after experts read it and 
pronounce it good. We monitor our product. Acceptance by the profession is 
essential to be recognized or accepted as a mathematician. 

The overall content of mathematics and its direction of movement respond to 
the pressures of society. The militarization of U.S. mathematics in World War II is 
an example. 

Newton’s calculus was a tool in his theory of gravitation. His gravitation theory 
was a response to the need for better understanding of the motions of planets. The 
motions of planets were important because England was a maritime nation. 
Navigational methods better than those of Spain and Portugal had cash value for 
England. 

In saying this, I don’t underestimate the insistence of pure mathematicians on 
autonomy. 


TAKING THE TEST. To test a philosophy of mathematics, ask it questions: 


(1) What makes mathematics different? 

(2) What is mathematics about? 

(3) Why does mathematics achieve near-universal consensus? 

(4) How do we acquire knowledge of mathematics, apart from proof? 

(5) Why are mathematical results independent of time, place, race, nationality 
and gender, in spite of the social nature of mathematics? 

(6) Does the infinite exist? If so, how? 

(7) Why does pure mathematics so often become useful? 


The humanist approach gives better answers to questions 1 through 5 than the 
neo-Fregean, the intuitionist-constructivists, or any other proposed philosophy I 
know of. ° 

Questions 6 and 7 are harder. I don’t say humanism answers these questions. 
But neither does anybody else. 

In conclusion, I want to destroy one of the most popular arrows opponents like 
to shoot at mathematical humanism. 

2+2=4, they say, everywhere and always. In fact, 2 + 2 = 4 before there 
were human societies, or even human beings. When 2 brontosauruses went to the 
water hole and met two other brontosauruses, there were four brontosauruses at 
the water hole. The truths of mathematics are universal, independent not only of 
individual consciousness but of social consciousness. 

This is Platonism, the view that Wittgenstein attacked so fiercely, and the view, 
let’s face it, that most mathematicians accept. 

How can a humanist answer? 

First of all, “two” plays two roles. It’s an adjective and it’s a noun. When you say 
“two brontosauruses,” “two” is an adjective. ““Two brontosauruses plus two bron- 
tosauruses equals four brontosauruses” is a statement about brontosauruses, not 
about numbers. Even if you say “Two discrete, reasonably permanent, non- 
interacting objects collected together with two others of the same ilk makes four 
such objects,” you are talking about properties of discrete, reasonably permanent 
non-interacting objects. That’s a statement in elementary physics. 
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The noun “two,” on the other hand, as everybody since Pythagoras knows, 
doesn’t name a physically observable thing. It names some abstract or ideal entity. 
Plato, Descartes, Frege knew that two is an ideal object. They explained what they 
meant by an ideal object only in negative terms—not mental, not physical. I’m 
pointing out that these abstract ideal objects are social concepts. 

“But,” says the Platonist, “how can you explain the fact that always and 
everywhere, regardless of time and place, politics or religion, race or sex, 2 + 2 
always equals 4? The only way to account for it is to say it’s an objective truth, 
which we all recognize because it’s an objective truth. Otherwise, the universal 
agreement that 2 + 2 = 4 would be an inexplicable miracle.” 

To this I answer, “It’s bad logic to say something must be true because you can’t 
think of any other explanation. That’s how philosophers used to prove the exis- 
tence of a Supreme Creator—they couldn’t conceive any other way for there to be 
a universe. 

“You say that because I haven’t got an explanation that satisfies you about the 
objectivity of mathematics, therefore I must believe in abstract entities whose 
relation to the physical world is obscure, which number incredibly remote uncount- 
able infinities, and which are apprehended by our mental or physical faculties in a 
quite unexplained manner. 

“TI don’t believe in them. You believe in them only by closing your eyes to their 
absurdity.” 

“Tm aware that some social or intersubjective concepts have the rigidity, the 
reproducibility, of physical science. The reproducibility of a mathematical calcula- 
tion is comparable only to the reproducibility of a physical measurement or 
experiment.” 

Somebody might ask, “Why does the physical world have attributes which are so 
consistent, so reproducible? Why is the gravitational constant the same from one 
day to the next? Why is the speed of light in vacuum so reliable?” 

No physicist or philosopher feels obliged to answer such questions. The possibil- 
ity of a science of physics is something we accept. We start from there, we don’t try 
to go back of it. Heidegger asked, “Why is there a universe?” I don’t know what 
progress he made. Not a promising investigation. 

As there’s lawfulness and stability in parts of the physical world, there’s 
lawfulness and stability in parts of the social-conceptual world. I don’t know why 
this is so. I’m sure it’s a fruitless question, as fruitless as the same question about 
the physical world. 

Study of the lawful, predictable parts of the physical world has a name. That 
name is “physics.” Study of the lawful, predictable parts of the social-conceptual 
world has a name. That name is “mathematics.” 
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Quadratics Representing Primes 


Nigel Boston and Marshall L. Greenwood 


1. INTRODUCTION. This paper is the result of a collaboration between two 
people of different worlds, (Boston) a 32-year-old professional trained at Cam- 
bridge and Harvard and (Greenwood) a 78-year-old self-trained amateur working 
without computer or calculator, using excerpts from [11] D. N. Lehmer’s complete 
list of prime numbers (covering from 2 to 10,006,721) and an IBM Table (covering 
from 2 to 52,004,201). For small primes (< 100,000) he used tables from the 
Handbook of Math Functions. 

The collaboration began in November 1993 when a colleague of the first author 
presented him with a copy of Issue 30 of the second author’s mathematics 
newsletter [6]. It contained nonstandard notation and language created by the 
second author. As the first author read it, he realized an interesting problem was 
being attacked by serious methods. At first he thought that a computer attack 
would produce better results than many years of working by hand had. It turned 
out, however, that the second author had found the best example for quadratics 
with relatively small discriminant (see the remarks after the table in Section 3) and 
that a better example would require an extensive search. 

In the following month or so the second author’s health declined, and he agreed 
to make public his private methods, when urged to do so by Prof. Diamond. Some 
of his methods were known to the first author, others were clever observations 
culled from years of work. The first author undertook an extensive computer 
search, partly using the second author’s results and methods and partly going 
beyond them, and the outcome is presented below. 


2. A HISTORY. In 1772 Euler [4] noted that the quadratic x* + x + 41 represents 
a prime for x = 0,1,...,39. The reason behind this remarkable property was 
made clear to number theorists by Rabinovitch [13] at the 1912 ICM, when he 
showed that for prime n > 0, x* +x +n represents a prime for x = 0,1,...,m — 2 
if and only if the field Q(V1 — 4n) has class number one. Since Q(v — m ) has 
class number greater than one if m > 163 (as shown by Baker [1], and Stark [16] in 
1966/7 and almost shown by a German high school teacher, Heegner, [8] in 1952), 
Euler’s example cannot be extended in this direction. It nevertheless does much 
better than known linear polynomials in having successive prime values, since the 
only such known that represents primes for x = 0,...,21 is 11410337850553 + 
4609098694200x, discovered recently by Paul Pritchard. See [18] for other good 
examples. 

Euler also considered the quadratic 2x* + n and showed that for n = 3,5, 11, 29 
this represents a prime for x = 0,..., — 1. In 1974 Hendy [9] proved that this is 
so for prime n > 0 if and only if Q(V— 27) has class number two. Again Euler’s 
result cannot be extended, since Baker [2] and Stark [17] in 1971 classified all such 
fields of class number two. There are similar results for polynomials of the form 
px? + px + n (see [9]). For more history on the subject, see [3], vol. I, pp. 420-421. 
Also, see [12] for partial results on quadratics with positive discriminants. 
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In another direction, several people have noted that there are, for example, 
more primes of the form x? — 2 than of the form x* + 1 in the long run. In fact 
there are consistently about 35% more. This was clarified by a conjecture of Hardy 
and Littlewood [7], who suggested that the number of primes < N represented by 
x? + c (c not minus a square) should be asymptotically of the form CVN /log N, 
where C is an explicit computable constant. For x? — 2,C = 1.85, whereas for 
x* +1,C = 1.37 [15]. A similar conjecture holds for any polynomial ax? + bx +c 
if we assume that a > 0,a,b,c are relatively prime, a + b,c are not both even, 
and the discriminant D = b? — 4ac is not a square. (These conditions remove the 
most obvious obstructions to the quadratic representing primes for large x. It still 
has not been shown that every such quadratic represents at least one prime.) The 
constant C is given explicitly in terms of a,b,c and is largest when D is not a 
square modulo many small primes. The reason for this will become clear later. 

In [5] Fung and Williams found polynomials of the form x* + x +c with large 
C. Their best example was c = 132874279528931 for which C = 5.09. (Compare 
C = 3.32 for Euler’s quadratic.) The corresponding D = 1 — 4c = 
— 531497118115723 is not a square modulo each prime from 3 to 179. 


3. THE PROBLEM AND RESULTS. The problem attacked by the second author 
was to find a quadratic polynomial that represents the most distinct primes for 
0 <x < 99. If we drop the requirement that the primes be distinct, we can easily 
get 95 primes by taking Euler’s x? — x + 41 and noting that for n <x <n +99 


for any choice of n = —65,..., —59,— 39,..., —34 this quadratic represents 95 
primes. Thus, for example, (x — 34)? — (x — 34) + 41 =x* — 69x + 1231 repre- 
sents 95 primes for x = 0,...,99, We will call two quadratics equivalent if one can 


be turned into the other by replacing x by x + n or n — x for some n or if one is 
minus the other. . 

We list a representative of each of the 20 equivalence classes we have found 
that give at least 86 distinct primes for n <x <n + 99 for some n. By replacing x 
by x +n as in the previous paragraph we then obtain quadratics that represent 
this many-distinct primes for 0 <x < 99. 


No. /100 Quadratic Discriminant Intervals [n, n + 99] 
90/100 41x? + 33x — 43321 7105733 n= —57 
88 /100 4x7 4+2x+ 41 — 652 = —27163 n= —40,- 39 
88 /100 8x? + 2x — 1097 35108 = 278777 n = —66,— 65,— 57,..., —53 
88 /100 27x? + 3x — 601 64917 = 377213 n= —58 
88 /100 9x? + 3x — 16229 584253 = 347213 n= —53,..., —47 
88 /100 37x? + 23x — 8863 1312253 n= —47,...,—42 
88 /100 29x? + 9x — 22111 2564957 n= —60 
88 /100 67x? + 45x — 12569 3370517 n= —72,- 61 
88/100 73x* + 59x — 18541 5417453 n = —60,— 55,— 54,— 53,— 39 
88 /100 59x? + 3x — 30109 7105733 n= —35,...,—31 
87/100 2x? — 199 1592 = 27398 n= 0,1 
87/100 8x? + 6x — 661 21188 = 275297 n=-71 
87/100 17x? + 7x — 20351 1383917 n = —57,— 56 
87/100 31x? + 21x — 13679 1696637 n= —68,— 66,..., —61 
87/100 41x? + 19x — 29879 4900517 n = —40,— 39,- 36,— 35 
87/100 41x? + 39x — 33829 5549477 n = —67, —66 
86 /100 x? +x4+41 — 163 n=0,1,2 
86/100 58x? + 42x — 15347 3562268 = 27890567 n= —53,— 52 
86/100 53x? + 35x — 26171 5549477 n = —70,— 67 
86/100 82x? + 46x — 41647 13662332 = 273415583 n= —39,— 38 
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The squarefree part of the discriminant is also presented, because it is relevant 
in our discussion later. This question had been considered previously by Karst [10]. 
His best example was 2x* — 199 above. Apart from this, none of his examples 
makes the above table. This is probably because all his other quadratics are of the 
form ax* + bx + c with a = b, negative discriminant, and no prime factors < 53. 
Karst’s 2x” — 199 is also discussed in [12]. 

The two examples above with negative discriminant come out of Euler’s work. 
The example with discriminant 35108 above was discovered by the second author 
in 1991, presented as —8x? + 530x — 7681 for 0 <x < 99. The example with 
discriminant 64917 was discovered by the second author in 1992, with a better 
interval provided by the first author. Our best example translates into 41x? — 
4641x + 88007 for 0 <x < 99. 


4. THE METHODS. The second author began by experimentally gathering 
“starters” that produce a lot of primes. For instance, he noticed that 8x? — 2x —n 
for n a prime that is 7 or 9mod 10 and 2 mod 3 works well in practice. This implies 
that its discriminant, 4 + 32n, is 2mod3 and 2 or 3mod5, i.e. not a square 
modulo these primes. 

Each equivalence class of quadratics contains a representative of the form 
ax* + bx + c where a > 0 and a > b > O. We can therefore restrict our attention 
to quadratics of this form. Let D = b* — 4ac. 

If ax* + bx +c is going to be prime for many values of x, then it cannot ever be 
even, because its parity depends only on whether x is even or odd. If x is even, 
ax* + bx +c =c (mod2). If x is odd, then ax* + bx +c =a+b+c (mod2). 
This is why, for our purposes, we need only look at quadratics such that c is odd 
and a + b even. | 

Suppose that p is an odd prime not dividing a. Consider the equation ax* + 
bx + c = 0 (mod‘p). The quadratic formula gives its solutions in terms of VD . It is 
then easy to see that ax? + bx + c is not divisible by p for any x, precisely when D 
is a quadratic non-residue (ic. not a square) modulo p. Since ax? + bx +c 
(mod p) depends only on x (mod p), we therefore need D to be a quadratic 
non-residue modulo lots of small primes. The first author conducted a search for 
D that are quadratic non-residues for at least 10 of the 11 primes between 3 and 
37. This search extended from — 200,000 to 2,000,000. Positive discriminants were 
preferred because they tend to produce better quadratics. In addition, some of the 
best discriminants (e.g. 3*7213) were multiplied by small squares and then tested. 
He later conducted a larger search (from — 1,000,000 to 60,000,000), capturing 
those D with small ©,[100(1 + n)/p], where n = 0 if D = 0 (mod p), n = 1 if D 
is a square mod p, and n = —1 otherwise. (This sum estimates how many 
divisibilities by small primes will occur.) 

Once a suitable D was found, he let b run from 0 to 100, found all possible 
corresponding a and c by looking at divisors of D — b*, and then checked each 
ax* + bx + c so produced to see if it tended to yield lots of primes. If so, it then 
qualified for finer testing to find intervals of length 100 on which it would produce 
the most distinct primes. 

One problem we ran up against was that of dealing with quadratics of the form 
f(x) = ax? + bx +c with a dividing b (so b = 0 or a by the restrictions imposed 
earlier). In these cases, for each x there is another integer y such that f(x) = f(y) 
and so to avoid repetition we have to restrict the intervals considered. A way 
around this for quadratics ax* + ax +c, noticed by the second author, is to 
consider instead the related quadratic 4ax? + 2ax +c. Note, for example, how 
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x* +x + 41 represents at most 86 primes for 100 consecutive x, but the related 
4x? + 2x + 41 does 2 better. 


5. THE CHALLENGE. So here’s the challenge. Can you beat 90/100? In fact can 
you make 100/100? According to a famous conjecture in number theory you can. 
This is Schinzel’s Hypothesis (H) [14], which says that if f,(x),..., f(x) are 
irreducible polynomials with integer coefficients, such that no integer n > 1 
divides f,(x),..., f,(x) for all integers x, then there should exist infinitely many x 
such that f,(x),..., f(x) are simultaneously prime. 

Consider, for instance, f(x) = x* + x + 17959429571 (gleaned from [5]). Let 
s = 100 and f(x) = f(x + i) for i = 1,..., 100. We show that 


(*) no integer n > 1 divides f,(x)... fio9(x) for all x. 


Hypothesis (H) then says that there should be infinitely many x for which 100 
consecutive values of f(x) are prime. These primes are distinct (f(x) = f(y) if and 
only if y =x or —1-—x, so the 100 values of x would have to straddle 0 for 
repetition, which they don’t). An equivalent quadratic will then be prime for all 
x =0,...,99. 

To prove (*), suppose a prime n divides f(x)... fig9(x) for all x. For a start, 
n > 127, since the discriminant of f, —71837718283, is not a square modulo all 
primes between 3 and 127. For n = 131, the roots of f(x) mod n are 63 and 67. 
Choosing x = 68 then shows that this n does not satisfy our hypothesis. For each 
prime n > 131, either f has no roots mod n or has roots spaced by > 100. (Note 
that for primes > 200 this is automatic.) 

In practice, such a sequence of 100 values could only occur for astronomically 
large x. Yihsiang Liow has searched for sequences of consecutive prime values of f 
for x < 100,000,000.. No sequence found so far has more than twenty terms. 

It is interesting to note that similar congruence conditions show that in any 
interval of length 100 at least 6 values of our best quadratic f(x) = 41x? + 33x — 
43321 must be composite. Hypothesis (H) would then say that there exists an 
interval of length 100 containing 94 distinct prime values of f(x). It is no wonder 
that Hypothesis (H) is doubted by a number of people. 


6. RELATED RESULTS. In conclusion, here are a few interesting related results. 
The last section discussed whether it would be possible to get 100 distinct 
consecutive prime values of a quadratic. AS a more modest aim, consider the 
question of beating Euler’s example x” +x + 41 which represents 40 distinct 
primes for consecutive values of x. In late 1988, Gilbert Fung and Russell Ruby 
did this, their respective examples being 47x* — 1701x + 10181 and 36x? — 810x 
+ 2753 [12]. These give distinct primes for x = 0,1,...,42 and x = 0,1,...,44 
respectively. 

Another question attacked by the second author (and unbeaten by the efforts 
on computer of the first author) is to find quadratics that are even half the time 
but that represent distinct primes for as many of the 50 remaining x as possible. 
The examples —4x? + 381x — 8524 and —2x? + 185x — 3181 each give 48 dis- 
tinct primes for 0 <x < 99. Can you find an example that gives 49 or even 50 
distinct primes? 
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598 QUADRATICS REPRESENTING PRIMES [August-September 


REFERENCES 


1. A. Baker, Linear forms in the logarithms of algebraic numbers, Mathematika 13 (1966), 204-216. 
2. A. Baker, Imaginary quadratic fields with class number two, Ann. of Math. 94 (1971), 139-152. 
3. L.E. Dickson, History of the theory of numbers, Chelsea, New York, 1971. 

4. L. Euler, Mém. de Berlin, année 1722, 36, Comm. Arithm. 1, 584. 

5. G. W. Fung and H. C. Williams, Quadratic polynomials which have a high density of prime values, 
Math. Comp. 55 (1990), 345-353. 

6. M.L. Greenwood, Mathematics Newsletter, 31 issues. Vol. I, Mathematics by a non-mathematician, 
245pp handwritten, San Diego Public Library, Main Library, and White Tower Library, Los 
Angeles. 

7. G.H. Hardy and J. E. Littlewood, Partitio numerorum III: On the expression of a number as a 
sum of primes, Acta Math. 44 (1923), 48. 

8. K. Heegner, Diophantische Analysis und Modulfunktionen, Math. Z. 56 (1952), 227-253. 

9. M. D. Hendy, Prime quadratics associated with complex quadratic fields of class number two, 
Proc. Amer. Math. Soc. 43 (1974), 253-260. 

10. E. Karst, New quadratic forms with high density of primes, Elem. d. Math. 28 (1973), 116-118. 

11. D.N. Lehmer, List of prime numbers 1 to 10,006,721, Carnegie Institution of Washington, 1914. 

12. R.A. Mollin and H. C. Williams, Class number problems for real quadratic fields, Number theory 
and cryptography, vol. 154, LMS Lecture Note Series, 1990. 

13. G. Rabinovitch, Eindeutigkeit der Zerlegung in Primzahlfaktoren in quadratischen Zahlk6rpern, 
Proc. Fifth Internat. Congress Math. (Cambridge), vol. I, 1913, pp. 418-421. 

14. A. Schinzel and W. Sierpinski, Sur certaines hypothéses concernant les nombres premiers. 
Remarque, Acta Arithm. 4 (1958), 185-208. 

15. D. Shanks, On the conjecture of Hardy and Littlewood concerning the number of primes of the 
form n? + a, Math. Comp. 14 (1960), 321-332. 

16. H. M. Stark, A complete determination of the complex quadratic fields of class-number one, 
Michigan Math. J. 14 (1967), 1-27. 

17. H.M. Stark, A transcendence theorem for class number problems, Ann. of Math. 94 (1971), 


153-173. 
18. §. A. Weintraub, Consecutive primes in arithmetic progressions, J. of Math. Rec. 25, no. 3 (1993), 
169-171. 
Department of Mathematics 3945 Alabama Street 
University of Illinois San Diego, CA 92104-2701 


Urbana, IL 61801 
boston@math.uiuc.edu 


For runners, it’s often frustrating to hear other athletes dismiss their 
sport as a torturous yet necessary evil that must be confronted if one is 
going to “get in shape.” (Soccer players, basketball players, ct al. arc 
notorious for this sort of unwarranted aggression.) The field of 
mathematics, to draw what I hope is fresh analogy, is akin to running 
in this very respect: it is a discipline often written off by engincers, 
biologists, chemists, and other applicd science types as a laborious 
means to an end. What the cynics fail to appreciate is the degree to 
which a simple, cogent proof or a graceful, powerful stride arc things 
of beauty independent of their utility with respect to other practices. 
For those who align themselves with the engineers and the soccer 
players, please read no further! Mathematical induction (or simply 
“induction” as it shall henceforth be known) is a technique whose 
subtlety and force you will fail to appreciate. 


Ben Rutter (sophomore at Swarthmorc) 


(Introductory paragraph in a paper explaining induction written for a discrete math 
coursc.) 
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How to Write a Proof 


Leslie Lamport 


1. MATHEMATICAL PROOFS. Mathematical notation has improved over the 
past few centuries. In the seventeenth century, a mathematician might have written 


There do not exist four positive integers, the last being greater than two, 
such that the sum of the first two, each raised to the power of the fourth, (1) 
equals the third raised to that same power. 


How much easier it is to read the modern version 


There do not exist positive integers x, y, z, and n, with n > 2, such that 
x" + y" = 2", (2) 


Yet, the structure of mathematical proofs has not changed in 300 years. The proofs 
in Newton’s Principia differ in style from those of a modern textbook only by being 
written in Latin. Proofs are still written like essays, in a stilted form of ordinary 
prose. 

Formulas written in prose, like (1), are hard to understand and hard to get right. 
Proofs written in prose are also hard to understand and hard to get right. 
Anecdotal evidence suggests that as many as a third of all papers published in 
mathematical journals contain mistakes—not just minor errors, but incorrect 
theorems and proofs. 

Statement (2) is easier to read than statement (1) for two reasons: variables are 
given names, and formulas are written in a more structured fashion. The benefits 
of using names is obvious. The benefit of structure is less obvious; we are so used 
to formulas like x” + y” =z” that we tend to take their structure for granted, and 
to think they are easy to read just because they are short. Although the brevity of 
the formula helps, it is primarily its structure that makes it easier to understand 
than a prose version. The expression 


x raised to the power n 
plus 
y raised to the power n equals z raised to the power n 


is quite long, but it is easy to read because of its structure. 

The same principles that make formulas easier to understand can make proofs 
easier to understand: proof steps should be referred to by name, and the structure 
of the proof should be manifest. 

The proof style I advocate is a refinement of one, called natural deduction, that 
has been used by some logicians for almost a century. Natural deduction has been 
viewed primarily as a method of writing proofs in a formal logic. What I will 
describe is a practical method for writing the less formal proofs of ordinary 
mathematics. It is based on hierarchical structuring—a successful tool for manag- 
ing complexity. 
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A method for structuring proofs was presented by Leron [5]. However, his goal 
was to communicate proofs better, not to make them more rigorous. Despite their 
hierarchical structuring, the proofs Leron advocated are quite different from the 
ones presented here. They do not seem to be any better than conventional proofs 
for avoiding errors. 

Avoiding mistakes when manipulating formulas requires careful, detailed calcu- 
lations. Avoiding mistakes when proving theorems requires careful, detailed proofs. 
When first shown a detailed, structured proof, most mathematicians react: “I don’t 
want to read all those details; I want to read only the general outline and perhaps 
some of the more interesting parts.” My response is that this is precisely why they 
want to read a hierarchically structured proof. The high-level structure provides 
the general outline, readers can look at as much or as little of the lower-level detail 
as they want. However, until one gets used to them, structured proofs do look 
intimidating. 

The ideal tool for reading a structured proof would be a computer-based 
hypertext system. It would allow the reader to concentrate on a particular level in 
the structure, suppressing lower-level details. In a printed version, one can ignore 
lower-level details only by skipping over that part of the text. While this is not 
ideal, the structure is displayed by the format, making such skipping fairly 
easy—certainly much easier than in a prose-style proof, where the format provides 
little clue to the logical structure. 


2. AN EXAMPLE. I take as an example the classic proof that ¥2 is irrational. 
Letting Q denote the set of rationals, the precise statement of the result to be 
proved is 


Theorem. There does not exist r in Q such that r? = 2. 


To illustrate hierarchical structure, the proof is carried out to a much lower level 
of detail than necessary for a typical reader. 


2.1. The High-Level Proof. The high-level structure of the proof—what one would 
see first with a hypertext system—appears in Figure 1. The proof assumes a lemma 
from which one can deduce that, for any integer n, if 2 divides n? then 2 divides n. 
The set of integers is denoted by Z. 


Theorem. There does not exist r in Q such that r* = 2. 


PROOF SKETCH: We assume r* =2 for r€Q and obtain a contradiction. Writing 
r=m/n, where m and n have no common divisors (Step 1), we deduce from 
(m/n)* = 2 and the lemma that both m and n must be divisible by 2 (Steps 2 and 3). 


ASSUME: 1. r € Q 
2.r7=2 


PROVE: False 
1. Choose m, n in Z such that 
1. ged (m,n) = 1 
2.r = (m/n) 
2. 2 divides m. 
3. 2 divides n. 
4.Q.E.D. 


Figure 1. The highest level of a structured proof of the irrationality of V2. 
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After the statement of the theorem comes a PROOF SKETCH, which is an 
informal explanation of the following proof. The proof sketch serves as a ‘“‘road 
map” to the proof, helping the reader understand intuitively why the proof works. 
This proof is so simple that the proof sketch is almost superfluous—the only 
information it provides that is not obvious from the high-level proof itself is that 
the lemma is used to prove Steps 2 and 3. 

Next comes the ASSUME and PROVE clauses. They assert that to prove the 
theorem, it suffices to assume the two hypotheses r € Q and r? = 2, and to prove 
false. 

Finally comes the proof. This is a sequence of statements that ends with 
“Q.E.D.,” which denotes the assertion to be proved—in this case, false. Think of 
this proof as the left half (the statements) of a high-school geometry style proof, 
the right half (the reasons) being omitted.’ 


2.2. Lower Levels of the Proof. Let us now examine the proof of Step 1, which 
appears in Figure 2. It is clear enough what must be proved, so no ASSUME / PROVE 
is needed. The proof consists of five steps, numbered 1.1 through 1.5. There is also 
a LeT statement, which defines the required m and n. (I prefer = to the more 
common symbol = for “equals by definition,” since = can also mean logical 
equivalence.) 


1. Choose m,n in Z such that 

1. gcd(m,n) = 1 

2. r=(m/n) 
1.1 Choose p,q in Z such that gq # 0 and r = p/gq. 
LET: m = p/gcd(p, q) 

n = q/gcd(p, q) 

1.2. mnEZ 
13. r=m/n 
1.4. gcd(m, n) = 1 
1.5. Q.E.D. 


Figure 2. The proof of Step 1. 


Each of these five steps in turn has its proof. The proof of 1.1 is just 
PROOF: By assumption :1. 


Assumption :1 is the first assumption (r € Q) in the proof of the theorem. (The 
numbering scheme for assumptions is explained below.) A hierarchical proof must 
stop somewhere. The general question of where to stop is addressed in Section 4.2. 
In this proof, we assume the reader understands that the definition of Q implies 
that r can be written as the requisite quotient of integers. The proof of 1.2 is the 
equally simple 


‘In their introductory plane geometry course, students in the U.S. are taught to write proofs in a 
two-column format, the left column containing a sequence of statements and the right column 
containing their justifications. 
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PROOF: 1.1 and definition of m and n. 


Step 1.3 is proved by a string of equalities, each with a brief justification. 


cd( p, 
PROOF: m/n = prec p, 4) [Definition of m and n] 


q/scd( p,q) 
= p/q [Simple algebra] 
=r [By 1.1] 


This type of proof, consisting of a string of equalities, is simple and direct; it works 
as well for proving any transitive relation, such as < , logical equivalence, and 
implication. It should be used whenever possible. 

Step 1.4 has the multistep proof shown in Figure 3, consisting of Steps 1.4.1 
through 1.4.3. The “1.4:1” in the proof of Step 1.4.1 denotes assumption 1 
(s divides m) in the proof of Step 1.4. The theorem itself is considered to be a step 
having the null string as its number, which explains why “:1” denotes assumption 1 
of the theorem. 


1.4. ged (m,n) = 1 


PROOF: By the definition of the gcd, it suffices to: 


ASSUME: 1. s divides m 
2. s divides n 
PROVE: s = +1 
1.4.1. s + gced(p, q) divides p. 
PROOF: 1.4:1 and the definition of m. 
1.4.2. s - gcd(p, q) divides q. 
PRooF: 1.4:2 and definition of n. 
1.4.3. Q.E.D. 
PRooF: 1.4.1, 1.4.2, and the definition of gcd. 


Figure 3. The proof of Step 1.4. 


3. FURTHER DETAILS 


3.1. A More Compact Numbering Scheme. The numbering scheme used in the 
example is fine for short proofs, with few levels of nesting. However, long proofs 
can have many levels—I often write proofs more than six levels deep. The number 
3.1.1.1.1.2 takes a lot of space, and having to distinguish it from 3.1.1.1.2 can soon 
lead to eye strain. 

We eliminate long step numbers by abbreviating 3.1.1.1.2, a five-part step 
number ending in 2, as (5)2. Figure 4 shows a fragment of a proof written with the 
two numbering styles. To understand why abbreviated numbers suffice, consider 
where Step 3.1.1.1.2 can be used in this proof. The step can be used only after it is 
proved, but it cannot be used just anywhere after its proof. Step 3.1.1.1.2 cannot be 
used in the proof of Step 3.1.1.2 because it was proved under the assumption of 
Step 3.1.1.1, which is different from Step 3.1.1.2’s assumption. The step can be 
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3.1.1.1. ASSUME: xE § (41. ASSUME: x © S§ 


PROVE: ... PROVE: ... 
3.1.1.1.1. ... (5)1.... 
3.1.1.1.2. ... (5)2. ... 
3.1.1.1.3. Q.E.D. (5)3. O.E.D. 
By 3.1.1.1.1 and assumption 3.1.1.1. By (5)1 and assumption (4). 
3.1.1.2. ASSUME: x € T (4)2. ASSUME: x © T 
PROVE: ... PROVE: ... 


Figure 4. Part of a proof, with long and abbreviated step numbers. 


used only where the assumptions under which it was proved hold, which means 
that it can be used only within the proof of its parent, Step 3.1.1.1. Step 3.1.1.1.2 is 
the only one in the proof of its parent with a five-part number ending in 2. 
Although there can be many proof steps with the same abbreviated number (5)2, 
no two of them have the same parent, so at most one of them may be used at any 
point in the proof. A reference to Step (5)2 always refers to the most recent step 
with that number. Part 3 of the statement of Step (5)2 is numbered (5)2.3. 

References to assumptions can be abbreviated even more. An assumption can 
be used only in the proof of a step, or the proof of one of its descendants. We let 
(5) denote the assumption of the level-five step that is an ancestor of (or is) the 
current step, and (5):3 denote the third numbered part of that assumption. Since 
the statement of the theorem has a zero-part number, its assumption is number 
(0). ; 

Figure 5 contains the complete proof of our example, written with the abbrevi- 
ated numbering scheme. 


3.2. Proof by Cases. Proof by cases can be expressed with a Case step, where 
CASE: Statement of assumption. 
is an abbreviation for 


ASSUME: Statement of assumption. 
PROVE: Q.E.D. 


The proof of the final “Q.E.D.” step explains why the cases considered are 
exhaustive; it is usually simple. Figure 6 illustrates the use of the Case construct to 
structure a proof by induction. Note how Step <1)1 is used in the proofs of both 
cases, showing why Case steps provide more flexibility than would a strictly 
hierarchical proof-by-cases construct. 


4. HOW GOOD ARE STRUCTURED PROOFS? 


4.1. My Experience. Some twenty years ago, I decided to write a proof of the 
Schroeder-Bernstein theorem for an introductory mathematics class. The simplest 
proof I could find was in Kelley’s classic general topology text [4, page 28]. Since 
Kelley was writing for a more sophisticated audience, I had to add a great deal of 
explanation to his half-page proof. I had written five pages when I realized that 
Kelley’s proof was wrong. Recently, I wanted to illustrate a lecture on my proof 
style with a convincing incorrect proof, so I turned to Kelley. I could find nothing 
wrong with his proof; it seemed obviously correct! Reading and rereading the 
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Theorem. There does not exist r in Q such that r? = 2. 


PROOF SKETCH: We assume r? = 2 for r € Q and obtain a contradiction. Writing r= m/n, 
where m and n have no common divisors (Step (1)1), we deduce from (m/n)? = 2 and the 
lemma that both m and n must be divisible by 2 ((1)2 and (1)3). 


ASSUME: 1. r € Q 
2.r* = 2. 
PROVE: False 
(1)1. Choose m,n in Z such that 
1. ged(m,n) = 1 
2. r= (m,n) 
(2)1. Choose p,q in Z such that q # 0 and r = p/g. 
PROOF: By assumption (0):1. 
Let: m = p/gcd(p, q) 
n = q/gced(p, q) 


(292. mane Z 
PROOF: (2)1 and definition of m and n. 
(2)3. r=m/n 
PROOF: m/n = p/ecXp, 4) [Definition of m and n] 
q/scd( p, 7) 
= p/q [Simple algebra] 
=r [By (2)1] 


(2)4. ged(m, n) = 1 
PRroor: By the definition of the gcd, it suffices to: 
ASSUME: 1. s divides m 
2. s divides n 
PROVE: s = +1 
(3)1. s+ gcd(p, q) divides p. 
PROOF: (2):1 and the definition of m. 
(3)2. s+ gced(p, q) divides q. 
PROOF: (2):2 and definition of x. 
(3)3. Q.E.D. 
PROOF: (3)1, (3)2, and the definition of gcd. 
(2)5. Q.E.D. 
(1)2. 2 divides m. 
(2)1. m? = 2n? 
PRooF: (1)1.1 implies (m /n)* = 2. 
(2)2. QO.E.D. 
PROOF: By (2)1 and the lemma. 
(1)3. 2 divides n. 
(2)1. Choose p in Z such that m = 2p. 
PROOF: By (1)2. 
(22. n? = 2p? 
PROOF: 2 = (m/n)* [(1)1.2 and (0):2] 
= (2p /n)* [¢2)1] 
= 4p*/n* [Algebra] 
from which the result follows easily by algebra. 
(2)3. Q.E.D. 
PRooF: By (2)2 and the lemma. 
(1>4. Q.E.D. 
PROOF: (1)1.1, ¢1)2, ¢1)3, and definition of gcd. 


Figure 5. A proof of the irrationality of V2. 
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Theorem. All natural numbers are interesting. 


ASSUME: n a natural number. 
PROVE: n is interesting. 


(1)1. A number is interesting if it is the smallest number not 
in an interesting set. 
PROOF: By definition of interesting. 
(1>2. CASE: n = 0 
PRooF: By (1)1, since 0 is the smallest natural number not in 2. 
(1)3. CASE: 1.n > 0 
2. n — 1 is interesting 
PROOF: By (1)1, since case assumption (1) implies that 
{k:k <n — 1} is interesting. 
(1)4. O.E.D. 
-  PRoor: Steps (1)2 and (1)3, assumption <0), 
and mathematical induction. 


Figure 6. The Case construct. 


proof convinced me that either my memory had failed, or else I was very stupid 
twenty years ago. Still, Kelley’s proof was short and would serve as a nice example, 
so I started rewriting it as a structured proof. Within minutes, I rediscovered the 
error. 

My interest in proofs stems from writing correctness proofs of algorithms. These 
proofs are seldom deep, but usually have considerable detail. Structured proofs 
-provided a way of coping with this detail. The style was first applied to proofs of 
ordinary theorems in a paper I wrote with Martin Abadi [1]. He had already 
written conventional proofs—proofs that were good enough to. convince us and, 
presumably, the referees. Rewriting the proofs in a structured style, we discovered 
that almost every one had serious mistakes, though the theorems were correct. Any 
hope that incorrect proofs might not lead to incorrect theorems was destroyed in 
our next collaboration [3]. Time and again, we would make a conjecture and write 
a proof sketch on the blackboard—a sketch that could easily have been turned 
into a convincing conventional proof—only to discover, by trying to write a 
structured proof, that the conjecture was false. Since then, I have never believed a 
result without a careful, structured proof. My skepticism has helped avoid numer- 
ous errors. 

I have also found structured proofs very helpful when I need a variant of an 
existing theorem, perhaps with a slightly weaker hypothesis. In a properly written 
proof, where every use of an assumption or a proof step is explicit, simple text 
searching reveals exactly where every hypothesis is used. 


4.2. Writing Structured Proofs. A structured proof format by itself will not elimi- 
nate errors. Proofs must be written carefully, with enough detail. Most errors come 
from not carrying out the proof to enough levels. The lowest-level, paragraph-style 
proofs should be short and completely transparent. One must be a skeptical reader 
of one’s own proofs. My own rule of thumb is to expand the proof until the lowest 
level statements are obvious, and then continue for one more level. This takes 
discipline. But, unlike conventional proofs, in which adding more detail can make 
a proof more confusing, structured proofs accommodate as much detail as desired. 

Structured proofs are longer than conventional ones. Although the formatting is 
partly responsible, structured proofs are longer mainly because they include more 
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detail. They make it obvious when steps have been forgotten or important details 
omitted. They make it hard to be sloppy. The assertion “this case is similar to the 
previous one” is not acceptable; one is forced to find the appropriate general step 
that makes the proof of both cases easy. Writing a rigorous proof is harder than 
writing a sloppy one, and lazy writers will find excuses to avoid doing it. A common 
excuse is that structured proofs are too long. But, shorter proofs are not necessar- 
ily better ones; the shortest proof is always “left as an exercise for the reader.” 

When journals are distributed electronically, they can include proofs down to 
the lowest reasonable level; the reader can suppress uninteresting details when 
viewing the article on the screen or printing it locally. But, for paper journals, extra 
pages mean killing extra trees. It may be inappropriate for a journal to print a 
proof with so much detail. I recommend that authors provide two versions of their 
proofs: a very detailed one for themselves, the referees, and interested colleagues; 
and a less detailed one for paper publication. It is quite easy to convert a detailed 
proof into a less detailed one by compressing the lower levels into paragraph-style 
proofs. Although the reader must fill in the low-level details, such proofs are much 
better than unstructured ones, in which authors seem to choose randomly which 
details to supply and which to omit. 


4.3. Reading Structured Proofs. So far, readers’ reactions to structured proofs 
have been mixed. Skeptical readers—ones who check for errors—like these proofs 
much more than conventional ones. Readers who want to skim the proofs are less 
happy with the style. Part of the problem is that the length of the proofs and the 
unfamiliar format are intimidating. The best way to read a structured proof is level 
by level—first reading the high-level steps (1)1, (12, (1)3,..., then the proofs of 
those steps, and so on. However, having to skip over the lower-level steps makes 
reading the high-level ones inconvenient. With hypertext, this is not a problem. 
With printed text, a layered presentation—the complete higher-level proof fol- 
lowed by the lower-level proofs—may help [2, section B.7 (page 48)]. 

These structured proofs do not seem ideal for someone who wants to under- 
stand. the important ideas of a proof without reading any of the details. Satisfying 
such readers may just require better proof sketches. Or, perhaps a better way of 
annotating a proof with comments is needed. Hypertext can provide graphical aids 
for finding one’s way around a proof and highlighting important steps. Maybe such 
aids can be developed for the printed page. 


4.4. The Future. Modern mathematical notation has evolved over hundreds of 
years. Its proof style is still stuck in the seventeenth century. Mathematicians tend 
to be conservative, and many are unwilling to consider that there might be a better 
way of writing proofs. But, I am told that mathematicians are embarrassed to learn 
that they published incorrect theorems, so they are motivated to avoid errors. I 
believe they will like structured proofs if they can be persuaded to try them. 

Computer scientists are more willing to explore unconventional proof styles. 
Unfortunately, I have found that few of them care whether they have published 
incorrect results. They often seem glad that an error was not caught by the 
referees, since that would have meant one fewer publication. I fear that few 
computer scientists will be motivated to use a proof style that is likely to reveal 
their mistakes. Structured proofs are unlikely to be widely used in computer 
science until publishing incorrect results is considered embarrassing rather than 
normal. 

The proof style described here has been developed over the past several years. I 
have written many hundreds of pages of structured proofs, mostly of algorithms. I 
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consider the style to be a great improvement over conventional, unstructured 
proofs. But, this is not the last word on the subject. I look forward to seeing 
structured proof styles evolve as mathematicians and computer scientists find 
better ways to write a proof. 


ACKNOWLEDGMENTS. My information about mathematicians’ errors and embarrassment comes 
mainly from George Bergman. The Case construct and several other details of the proof format were 
developed in discussions with Urban Engberg and Peter Grdénning. Peter Dickman and Lyle Ramshaw 
found errors in an earlier version. 
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GRAPHING CALCULATOR MATHEMATICS 


Today, a graduate student in a second term of graduate complex 
analysis proudly displaycd a program on a graphing calculator to map 
the unit circle onto an ellipse with a linear fractional transformation. 
Queried whether “we” had proved carly in the first term that linear 
fractional transformations map gencralized circles onto gencralized 
circles, the student then busily checkcd somcthing on the graphing 
calculator without first paying any attention to the “proved” theorem. 
The student’s reaction corroborates rumors of departments about 
mathematics that promote graphing explorations while forbidding 


proofs in calculus. In a department of mathcmatics, however, calls for 
proofs soon brought the student back to the complexificd reality. The 
incident demonstratcs the value of research and assessment of 
technology in mathematics cducation and of the resulting graduates’ 
employability. : 


Yves Nievergelt 

Department of Mathematics 
Eastern Washington University 
Cheney, WA 99004-2431 
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Searching for Common Generalizations: 
The Case of Hyperbolic Functions 


Kenneth B. Stolarsky 


1. INTRODUCTION. Given two different but related facts, it can be worthwhile 
to find a single simple statement that contains both of them. For example, Max 
Planck found after a long deliberate search a law of thermodynamics that included 
both the Rayleigh Jeans law for long wavelength blackbody radiation and the Wien 
law for short wavelength blackbody radiation. The significance of this went far 
beyond having “one less law to remember”; it led to quantum mechanics, one of 
the greatest achievements of twentieth century science. A parallel example in 
mathematics could have arisen from an attempt to find a good common generaliza- 
tion of the laws 


sin(z + 27) = sin(z) (1.1) 
and ; 
tanh(z + iv) = tanh(z). (1.2) 


It does not in fact seem to be the case that any late eighteenth century mathemati- 
cian deliberately set out upon this task. But it could have led to fame commensu- 
rate with Planck’s. With hindsight, define y(z) as the solution to 


dy 1/2 1/2 
a (1—y*)’ (1-k’y’) 
y(0) =0 


where 0 < k < 1. For k = 0 we have y(z) = sin z and for k = 1 we have y(z) = 
tanh z. For 0 <k <1 the function y(z) has a real period A(k) and a purely 
imaginary period iB(k) where A(k) > 27 and B(k) > ~as k > Oand A(k) > ~ 
and B(k) — aw as k — 1. This leads to the theory of doubly periodic functions, i.e. 
elliptic functions, one of the greatest achievements of nineteenth century mathemat- 
ics. 

Our theme is the question of what constitutes a good common generalization of 
previously disparate mathematical facts. In §2 we discuss this in general, with a 
variety of examples, some of which belong in every mathematician’s repertoire. In 
the following sections we restrict ourselves to considering various pairs or triplets 
of facts about hyperbolic functions, and create common generalizations. There is no 
pretense to any finality here—the reader may find other or better generalizations. 
Perhaps our most curious result is the inequality (6.1). (Since its proof hinges on 
the “lucky” factorization of Lemma 3 rather than upon any systematic method, it 
incidently motivates a question that has been asked before: is there an algorithm 
for deciding the truth of any “similar” inequality involving hyperbolic functions? 
(Roughly, can the Tarski algorithm be extended to include real exponentiation?) 


(1.3) 
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There isn’t one as yet, but (see §8) mathematical logic has recently shed some light 
on this in a surprising way.) 

We now ask whether “creativity” in any mathematical area can be “mechanized” 
by the process of selecting pairs of theorems and “systematically” searching for 
common generalizations. This seems unlikely, especially since there may be (see 
§2) a vast number of “useless” generalizations. However, there is at least one area 
of research where simply looking for generalizations (even of single facts) has been 
paying off well for more than a century, the area of “g-analogues.” Roughly 
speaking, one tries to replace every integer n by (q” — 1)/(q — 1) and every 
derivative f’(x) by (f(gx) — f(x) /(qx — x), where |g| < 1. Many results of depth 
and importance have come, e.g., from g-analogues of identities between hypergeo- 
metric functions. Before going on, we observe that this area provides a very 
straightforward example of a common generalization. The Taylor expansion 

co £1) 
poy Se 


n=1 


(1.4) 
has the g-analogue 


flqx) — f(x) 


qx — Xx 


FM) g?-1 
ni! q-1 * 


-§ a 


which resembles the familiar 


x) — f(a ~ f(a 
fe) =A) _ 5 FH) 


xX —Aa 


(x—a)" (1.6) 


n=1 


A common generalization of (1.5) and (1.6) is 


f(x) = fleet Taga) _ sf) at 


x-—(qx+(1-q)a) n2. “! q-i'i 


Much of this paper requires no prerequisite beyond calculus. In some places a 
knowledge of elementary differential equations (and in a parenthetical aside in §3 
some rudimentary complex variable theory) would be helpful. Sophisticated read- 
ers may view the results of §3 in terms of Ricatti equations and those of §5 (See 
especially (5.5)) in terms of Bernoulli equations. In §7 we use the “big O” notation 


and the important fact that 
1 \\" x \P 
= = lim f + =| =e". 1.8 
>» \ (1.8) 


2. UNIFICATION AND GENERALIZATION. Terms such as “grand unification” 
are bandied about more by physicists than by mathematicians, so we’ll begin with 
physics. Laws of physics are typically written down as equations, e.g. E — mc* = 0 
and V-B=0. The question has been raised in earnest as to whether all our 
knowledge of physics can be “unified” and reduced to one single equation. An 
answer to this is given in volume II of The Feynman Lectures on Physics [F-L-S, 
pp. 25-10, 25-11]. Let F, = 0, F, = 0,..., F, = 0 be an enumeration of all the 
equations expressing laws of physics. Then 


(x-a)"". (17) 


lim 


poo 


Xx 
1+—+0 
p 


y F2 =0 (2.1) 
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is the desired equation! Although initially amusing, I think most regard this as a 
brazen swindle. It really doesn’t tell us anything we didn’t already know. [This trick 
goes back well before Feynman. In fact, it does have some use in mathematical 
logic. For example, in [S, pp. 2-3] it is used to show that the solution of 
Diophantine equations of arbitrary degree is no harder than the solution of 
Diophantine equations of fourth degree. To get a feeling for this, consider the 
Diophantine equation x? + y* =z° + 2. If we set v =z? and w =v’, so wz + 2 
=x? + y*, it has become a system of second degree equations. Combine them as 
above.] 

Let’s now give illustrations (in mathematics) of good generalizations. It is known 
that 


00 dx 
|= aol (2.2) 
and 
«© COS xX _ 4 (23 
3 +1 e 3) 
Both of these are consequences of 
0 COS AX 7 
J 4 qe = —e 4, a>0,A=>0. (2.4) 
0X a a 


Compare the informative nature of (2.4) with what you get from the scheme of 
(2.1). In fact (2.4) tells us much more than the two equations it generalizes; for one 
thing it suggests the Riemann-Lebesgue Lemma [Ap, pp. 469-470], that 


lim | g(x)cos Axdx = 0 


Ar? w~ 4 _ 


for well-behaved functions g(x). 
Another example is provided by Euler’s 


m=—-o 


(Hy gren 02 = I] (1 — 9") (2.5) 
together with Gauss’ 


(oe) ; 1 - 
y (-1)"¢" = TI — 


———., 2.6 
n=-—o m=1 1 + q” ( 
Compare what (2.1) produces with Jacobi’s elegant [An, p. 21] formula 
y z"g” _ I] (1 _ gent? )\(1 4 zg?"**)\(1 4 zig?"*t), (2.7) 
n=—-—© n=0 


There is, however, a bit to be said in favor of (2.1). It doesn’t introduce 
additional parameters, and one can recover the special cases rather easily. The 
additional parameters could of course be regarded as a bonus, but one does need a 
bit of cleverness to deduce (2.5) and (2.6) from (2.7). In fact, given propositions P 
and Q, it is not always easy to decide whether or not P is a generalization of Q. 
This difficulty will arise again in connection with the result of §6. For some further 
discussion of this see [B, p. 34]. 


1995] HYPERBOLIC FUNCTIONS 611 


We now turn to generalizations of inequalities. For example, can 


2ab 
a <vab a,b>0 (2.8) 
and 
a+b (a@+b3\” 
_— 5 a,b>0 (2.9) 


be unified? From [B-B, pp. 17-18] we find that 


I/s a+ pt\'" 


2 


a’ +-.b* 
2 


a,b>0,-~<s<t<~, (2.10) 


By taking s = 1 and t = 3 we recover (2.9), but it is not obvious that (2.8) is a 
pecial case. 

One needs to take s = —1 and consider the limiting case t — 0 (Hint: take 
logarithms and apply L’Hopital’s rule). 

Is it true that any system of inequalities has a common generalization? Say 
F (a,b) = 0, F,(a, b) = 0,..., Fy(a, b) = 0 for (a, b) in some region R. Let 


(t — 1)°(t- 2) (NY 


LAt) = 
i(t) (t —i)° 


Then 
N 
YL (t)F (a,b) >0, (a,b) ER, t real (2.11) 
t=1 


is a common generalization. It is:clearly true, and for t = j it reduces to the jth 
inequality F(a, b) => 0. However, this has all the faults of (2.1), and the additional 
“fault” of introducing another parameter. 


3. TWO INTEGRALS. Let 6 > 0. Then 


00 1 2 
——— | dx = tanh 6 3.1 
lla) an (3.1) 
and 
i “tanh?( x + 6) dx is divergent. (3.2) 
0 


These are quite simple, but it is not clear that they belong together. Is there a good 
common generalization? A small clue is provided by the theorem below that 
describes the solutions of a certain type of differential equation. 


Theorem. For t real, let f(t) and g(t) be functions that are nonnegative and continu- 
ously differentiable with f(0) = 0, f’(t) > 0 fort > 0, g(t) > 0 fort = 0, and 


lim g(x) = 0. (3.3) 
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Let uy = 0. If u = u(x) satisfies 


du 
a g(x) — f(4) (3.4) 
u(O) = uy 


for 0 <x < ©, then u(x) > 0 and u(x) — 0 asx > &, 


d. 
Proof: If u, = 0 then i = 2(0) — f(0) > O and u(x) is positive in some deleted 


right neighborhood of 0; if u, > 0 this is true by continuity. If u(x) = 0 for some 
x > 0 there is a smallest such x, say x). We then have that u'(x,) = g(x) - 
f(0) > 0 and hence u(x) is negative in some deleted left neighborhood of x,. The 
Intermediate Value Theorem now shows that u(x,) = 0 for some x, with 0 < 
x, <X 9, a contradiction. Hence u(x) > 0 for x => 0. Now let e > 0. If u(x) 2 e 
for all sufficiently large x, then 


u'(x) = g(x) — f(u(x)) 


lA 


s(x) - aS) - Ke 


lA 


— fle) /2 


for all sufficiently large x, since f(t) is increasing. This implies u(x) will become 
negative, a contradiction. Hence there is a sequence of real numbers x, <x, < 
x,< + with x, — © such that u(x,) < «. Choose n so large that g(x) < 
5f(2e) for x >x,, and let x* be the smallest x such that x >x, and u(x) > 2e. 
Clearly x* is strictly to the right of x,. However 


u'(x*) < $f(2e) —f(2e) <0, 


so u(x) is greater than 2€ in some deleted left neighborhood of x*, a contradic- 
tion. Hence u(x) < 2 for all x > x, and since e > 0 was arbitrary, this says that 
u(x) > 0 as x > ©, It now follows from the differential equation that u’(x) > 0 
as x —> 0, 

We now consider the function u(x) defined by 


u'(x) =e * —u*(x 
(x) (x) 35) 

u(0) = tanh 6 

where 6 > 0 and 0 <a < », [Here are some comments for readers concerned with 

technical rigor. To see that a function exists for all x => 0 and is unique, observe 

that the coefficients of the linear second order differential equation 


vy" —e*y =0 


have no finite singularities, so there is a general solution v = c,v,(x) + c,v,(x) that 
is analytic for all finite x with the Wronskian of v, and v, never vanishing. This 
means that some such v can be found that satisfies v'(0)/v(0) = tanh 6, and a 
straightforward calculation shows that u(x) = v'(x)/u(x) satisfies the differential 
equation. Moreover, v’(x)/u(x) can have as singularities only simple poles, and if 
some real x, > 0 is a pole of u(x), then u(x) (which is always positive) must 
approach + as x — x, from the left. However, u'(x) is then always negative in a 
deleted left neighborhood of x,, a contradiction. Hence u(x) exists for all x = 0; 
its uniqueness is an easy consequence of the standard uniqueness theorem.] 
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By the previous Theorem, lim, _,,,u(x) = 0 so 


f(x) dx = foe ds — fu) dx 
1 . 
=— - ) jim u(b) — u(0)| (3.6) 


= + + tanh 6. 
a 
We recover (3.1) by letting a > © and (3.2) by letting a — 0. 
4. A NONLINEAR GENERALIZATION OF HYPERBOLIC FUNCTIONS. Let 
x = 0. Consider the following facts. If 
y,(x) = sinh x and y,(x) = cosh x then y,(x) —y,(x) > O0asx—>o. (4.1) 
If 
y,(x) =x and y,(x) =x + 1then y,(x) —y,(x) =1 forall x. (4.2) 
If 
y,(x) = vx + 1 and y,(x) = 2vx + 1 then y,(x) —y,(x) > ~as x > &, 
(4.3) 


One might say that (4.1) is trivial while (4.2) and (4.3) insult your intelligence, but 
what is a good common generalization? 


‘Theorem. If y, = y,(x) and y, = y,(x) are functions satisfying 0 < y,(0) < y,(0) and 
the differential equations 


by ay 


m7 OM Gor (4.4) 
where C >. 0, then for some constant Cy 
0 a>QO 
yo(x) —yi(4) 7 1 eo &=9 (4.5) 
o a<Q 


as x increases without limit in the (possibly infinite) domain of definition of y,(x) and 
y2(x). 


Before proving this we observe that the case a = 0 is trivial, while a = 1 is 
straightforward since one then obtains simple Jinear differential equations for y, 
and y,. In particular, for C = 1 and 0 = y,(0) < y,(0) = 1, we have the familiar 
equations 


d d 
7, Sinh x = cosh x, Fy cosh x = sinh x. (4.6) 


For a > 1 it is not hard to show that there is a finite x, such that 
lim y,(x) = lim y,(x) =. 
XX XX 


The theorem is at any rate plausible, since for a > 0 the smaller function always 
has the larger derivative while the opposite is true for a < 0. However, the only 
case Other than a = 0, 1 in which the author can explicitly solve for y, and y, (and 
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thus establish the result directly) is a = —1; we leave this as a simple exercise that 
“yields” (4.3). The idea of our proof is to exploit a relationship that generalizes the 
familiar cosh* x — sinh? x = 1 relation that follows from (4.6) by differentiation. 


Proof: Clearly y, and y, are non-decreasing. When a > 0 we have from y,(0) > 0 
that y,(x) > ~ and hence that y,(x) — ». Say a < 0. If y,, y, are bounded, their 
derivatives will both have positive lower bounds, a contradiction. Hence one of 
them will approach ~ as x increases. Now observe that for a # —1, 


yo(x)*"' —y(x)*"" =y,(0)°*! -y,(0)°"' =K 40 (4.7) 


for some constant K: just differentiate! If a > 0 then K >0 and the Mean Value 
Theorem yields 


K = (y, —y,)(a@ + IE" 
where é is between y,(x) and y,(x), so 


y2 —y, <K/(a+t 1)y(x) > 0 


as x increases. For —1 < a <0 set z,(x) =y,** and z,(x) =yj{**. Then again 


K > 0 and 
y(x) —y,(x) = 2, 


1/lt+a 1/ita 


— £4 


E(w ta) 


= (2-2) 75 


z(x) er? _— 


_ (-—a) 
l+a l+a Yi(x) 


In this case we have y,(x) —> ©, but this clearly implies that t yi) —> © and the 
result follows. Finally, say a < —1. Then 


1 1 
yz! y,* 
so both of y, and y, cannot approach infinity as x increases. However, we know 
that one of them must approach ©, Clearly it is y,, and the result follows. 
We remark that if a > —1 and y,(0) = 0, the ratio 


t(x) =t(x;a,C) =y,(x)/y2(%) 
will resemble the hyperbolic tangent in several ways. From (4.7) it is clear that 
t(x) > 1 as x increases, and 


=K<0 (4.8) 


; ; —2C*Ky¢ 
D’(y,/y2) = D(CK/y3) = > <0 
2 


so the function is concave downwards. Also, the inequality tanh x <x has the 
analogue 


< aes 
yo y2(0) 
Finally, ¢(x) solves the differential equation 


y' — CK eet) (4 — yattye/orh) 


Let us say that we interpret this for a = —1 as the statement “y is constant.” 
Then we have the curious fact that for a = —5,— 3,— 2,— 1,0,1,3 the equation 
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can be integrated in terms of familiar functions (indeed elementary functions aside 
from —5 and 3 which require elliptic functions). Perhaps the “transelliptic” 
integrals that a = 2 and a = —4 lead us to are worthy of special attention? 


5. SOME HYPERBOLIC INEQUALITIES. The identities 
sinh 2u = 2 sinh u cosh u (5.1) 
and 
cosh? x — sinh? x = 1 (5.2) 
suggest (in very different ways) certain inequalities. First, what happens if each 2 is 
replaced by x in (5.1)? We shall see that if 0 <x < 2 and u = 0, the left side is 


then at most as large as the right side. Next, observe that integration of 1 < cosh x 
yields x < sinh x and hence 


x*(cosh*? x — sinh? x) < sinh? x. (5.3) 


A slight rearrangement here gives 


(5.4) 


XxX 
= —————- < tanh x. 
a(x) = “p——> < tanh + 


This is fairly tight; numerical calculations indicate that the difference of the two 
sides is at most .073688... (near x = 1.6219...). Both of the above functions are 
concave downwards, behave like x as x — 0, and both approach 1 as x > ~. 
Hence if C is any constant and f(x) denotes either of them, we expect 


x Xx 
{Od ~x~x-C~ ~~ C 


as x — ©, In fact we have the identity 
x Xx 
[ a(t) dt = —~ -1. (5.5) 
0 a 


Now, what happens if in (5.5) we replace each a(-) by tanh(-)? We shall see that 
here it is the right side that becomes smaller. Thus our program now is to find a 
common generalization of 


sinh xu <x cosh u sinh u, O0<x<2,0<u (5.6) 
and 
0<xcothx—1<logcoshx, O<x. (5.7) 
First, however, let’s show they are true. Inequality (5.6) may be rewritten as 


sinh 2u 


f(u) = sinh xu <x = g(u). 


Here f(0) = g(0) and f'(u) < g’(u) is 
x cosh xu <x cosh2u, 
and (5.6) follows. From the obvious inequality sinh x < cosh x we deduce 


tanh x < 1 <cothx. (5.8) 
A differentiation argument similar to the above (and even easier) shows that 


tanh x <x < sinh x, x> 0. (5.9) 
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Now the left side of (5.7) is nothing more than the left side of (5.9). The right side 
of (5.7) is more subtle; it will be established by an appropriate introduction of a 
2sinh x cosh x term. The attempt to prove it by differentiation leads to the 
perhaps unfamiliar inequality 


Xx 
coth x — sinh? x < tanh x; (5.10) 


is this true? The left side of (5.9) yields 
2cosh x sinh x — 2x < 2cosh x sinh x — 2 tanh x 
and the right side of the above is (cosh? x — sinh? x = 1) 
2(cosh* x — 1)tanh x = 2sinh! x 
cosh x 
Cancelling twos gives 
cosh x sinh x — x < tanh x sinh? x 


and division by sinh’ x yields (5.10) and hence the result. 
6. THE COMMON GENERALIZATION. Here we establish the 


Theorem. Let 0 <s < 1,0 <u, and 1 <p. Then 


| >) «(=e 


1+ 
Pp sinh pu 


4 


5 \P 
< f + 4 cosh pu. (6.1) 
14 ° 


Perhaps this does not seem to bear any relation to (5.6) and (5.7); we’ll address 
this concern later. Since lim,_,, sinh x/x = 1, ie., smhx~x as x 0, the 
theorem is certainly true in the limiting case u — 0. For p — © it implies that the 
middle term tends to e* provided that u goes to zero faster than 1/p. 

The proof requires some preliminary results; the essential key to the proof is 
Lemma 3. 


Lemma 1. Let x,u > 0. Then 


sinh x 
f(x) = and g(x) =x coth xu (6.2) 
x 
are increasing in x. 
Proof: We have 
(cosh x)(x — tanh x) 0 and e (sinh 2xu) — 2xu 
M(x) = ne > Oand gi(x) = 2 sinh? xu 


Lemma 2. Let u > 0, p = 1. Then 
a(u) = tanh pu < ptanhu = d(u). (6.3) 


Proof: Both sides are 0 when u = 0 and a'(u) < b’(u) is 


Pp Pp 
cosh? pu ~  coshu— 
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Lemma 3. For u > 0 and p = 1 we have 


(p + 1)coth( p + 1)u — pcoth pu < tanh pu (6.4) 


Proof: The above may be written, in an obvious notation, as 
C 1 + tanh pu tanh u 
D tanh pu + tanh u 


p '” tanh pu + p'” coth pu A 
= coth( p + 1)u < 172 172SCOC~CS~S = —. 
p * +p B 
Thus the assértion is that coth(p + 1)u is at most a certain weighted average of 
tanh pu and coth- pu. The author does not find this at all obvious, but it happens to 
be true that 
AD — BC 1 
———— = —p!/*(1 — tanh’ 
BD Bp? | (1 ~ tanh’ p | 
recall (5.8) and Lemma 2! 
We now prove the Theorem by logarithmic differentiation. It is also helpful to 
make the replacement 


p tanh u | 
———— -— 1] =0; 
tanh pu 


[1+ =) (pt+s)"/p? 
D pts) /p 


and associate (p + s)? and p? with (sinh(p + s)u)? and (sinh pu)? respectively. 
The left inequality now follows easily from the property of f(x) given in Lemma 1. 
For the right side it suffices to show that 


d sinh( p + s)u sinh pu 
log ———————_- —- log 


d . 
Tu | < Woe cosh pu, 


pts 
i.e. that 

(p + s)coth( p + s)u — pcoth pu < tanh pu. 
By the property of g(x) given in Lemma 1, this would follow from the most 
extreme case in which s = 1. But this is simply Lemma 3, and we are done! 
7. IS IT A GENERALIZATION?. If we set p = 1 the right inequality of (6.1) 
becomes 
sinh(1 + s)u < (1 + s)sinh u cosh u, 


and upon setting x = 1 +s we obtain (5.6). To extract (5.7) let u =x/p and let 
p — ©. Thus 


SX SX P 
e* < lim|cosh— + sinh—coth x] <e*coshx. 
Pp Pp 


Since 
cosh x = 1 + O(x”) and sinh x = x + O( x’) 
the expression raised to the pth power above is 
i fi sx coth x fi o(—) _ psxcoth » 
p 


p? 
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as Dp —> ©, SO 


1 < es th x-5 < cosh x; 
now take logarithms. 


8. CONCLUDING REMARKS. The proof of the Theorem hinged upon some 
rather good luck at one point (Lemma 3). Is it possible that there is an algorithm 
that will mechanically prove or disprove any such inequality? No one knows, but a 
wonderful result has recently been proved by A. J. Macintyre and A. J. Wilkie 
[M-W]. If Schanuel’s conjecture regarding transcendental numbers is true, then any 
inequality 
F(x,,...,xX,) = 9, 

where the function F is formed by any finite number of rational operations and 
real exponentiations, is decidably true or false! Schanuel’s conjecture is that the 
transcendence degree of 

K = Q(a,,...,@,,e7,...,e%"), 
which we denote by tdeg K, is at least m when a,,..., a, are linearly independent 
over the rational field Q. (In particular tdeg Q(y2, eV?) > 1, so one of ¥2 and ev? 
is not algebraic: this particular case of the conjecture, and a few others, are already 
known). Thus every proof of a hyperbolic inequality can be regarded as evidence 
(perhaps infinitesimal evidence) in favor of Schanuel’s conjecture. 

We end by asking if we can decide whether two hyperbolic inequalities, say 
H, = 0Oand H, = 0, have a “good” common generalization. This seems hard even 
to formulate; for one thing neither “good” nor “generalization” have been formally 
defined. Moreover, if H, = 0 is any further “unrelated” inequality, then it could 
be claimed that the scheme of (2.11) applied to H,, H, and H, yields a truly 
informative common generalization of H, and H,. The right way of posing the 
problem remains .to be found. 
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Answer to Picture Puzzle 
(p. 594) 


Wendy and Alex Robertson—not the only man and wife team of 
mathematicians on record, but one of the few such. 
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Transforming n-gons by Folding 
the Plane 


P. Sabinin and M. G. Stone 


1. INTRODUCTION. A number of authors have investigated labelled polygonal 
rings in the plane, and the reconstruction of these rings from distorted images. 
This reconstruction problem originates in Biology, where the accepted model views 
chromosomes as the vertices of a polygonal ring ((1], [2]). Invasive experimental 
techniques appear to distort this ring structure. Folding is one possible form of 
distortion, and we examine the effect of folding on the distribution of points in a 
(regular) polygonal ring. Folding the plane transforms a regular n-gon into a set of 
n (not necessarily distinct) images. We prove, for example, that every set of three 
points in the plane is the image of a suitable equilateral triangle under a single 
fold. Every four points can be obtained from a suitable square by a sequence of at 
most three folds. Some experimentation leads to the natural conjecture that each 
arbitrary n-point configuration is the image of a suitable regular n-gon under 
finitely many folds. We prove that this is -true and establish bounds for the 
required number of folds. 


2. THREE POINTS. We consider transformations of the plane obtained by fold- 
ing the plane along a line. A fold along a line L leaves each point fixed in one half 
plane determined by L, and reflects each point in the other half plane into its 
mirror image in L. If a point x is transformed under a fold, we will denote the 
image of x by x’, and rely upon context to identify the precise nature of the fold 
itself. Line segments connecting points a and b will be denoted by ab, and the 
length of ab is |ab|. We indicate that x belongs to the line segment ab by writing 
x €ab. 

We are interested first in the locus of points which are accessible from some 
given point c by a single fold. Without further restrictions, clearly every point d is 
accessible from c by a single fold along L, the perpendicular bisector of cd. If we 
further require that another point a is fixed under the fold, then exactly those 
points d which are interior to (including those on the boundary of) the circle with 
center a and radius r = |ac| are so accessible. If a certain pair of points a and b 
are both to remain individually fixed under the fold, then the points accessible 
from c by such a fold are precisely those which lie in the “lens” determined by c 
between a and b, as indicated in Figure 1. Folding c to d, for d in the (shaded) 
lens determined by c between a and b, leaves both a and _b fixed. This is easily 
seen since L, the perpendicular bisector of cd, fails to meet ab; indeed cd is part of 
a chord on the circles which determine the lens, and the bisector of such a chord 
passes through the endpoint of ab which is the center for the circle in question. 
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Figure 1. The “lens” determined by c between a and b. 


More generally: 


Lemma 1. The points accessible from c by a single fold which leaves a,,..., a, fixed 
are exactly those points interior to, or on the boundary of, the intersection of the circles 
C, through c with centers at a;, i = 1,...,n. 


Proof: The Lemma is obvious for n = 1,2. Consider k + 1 points where k > 2. 
Let d © ()*}C, and fold c to d along L, the perpendicular bisector of cd. Note 
for each j = 1, 2,. ,n that d © C; so the perpendicular bisector of cd passes 
through the line sepment ca; (including the endpoints), hence the fold along L 
leaves each point a, fixed, j = 1,...,n. O 


It is also useful to observe: 


Lemma 2. For any fold along a line L we have lab| > |a’b’|, that is the distance 
between any two points can only decrease under folding. 


Proof: “This is an easy consequence of the triangle inequality. O 


Theorem 1. Given any three points in the plane, a, b,c, there is an equilateral triangle 
T with vertices x, y,z, for which a, b, and c are the images of x, y, and z under a 
single fold. 


Proof: Label the points so that |ab| is the largest distance between any two of the 
three given points. The remaining point c then lies inside one half of the lens 
which is the intersection of two circles of radius |ab| having centers at a and b 
respectively. (See Figure 2.) Let T be either of the two equilateral triangles 
determined by the segment ab, with vertices x =a, y =b, and z. Notice that a 
fold along the line L which is the perpendicular bisector of cz takes z to c and 
leaves both a and b fixed by Lemma 1. O 


3. FOUR POINTS. We conjecture that every four points are the images of the 
vertices of a suitable square under at most two folds. We prove two weaker results: 


Theorem 2. Given any four points in the plane a, b, c, and d, there is some square S 


with vertices x, y, z, and w for which a, b, c, and d are the images of x, y, z, and w 
under a sequence of at most three folds. 
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Figure 2. Points accessible from an Equilateral Triangle. 


Proof: There are three cases according to the configuration of the four given 
points: the points may be the vertices of a rectangle, or be collinear, or neither of 
these. If the points form a rectangle, obviously one fold will suffice. If the four 
points are collinear, we construct a square whose diagonal contains a, b, c, and d 
and has one of the two extreme points, say a, as a vertex. Choose the square 
sufficiently large that all four points lie in one half of the diagonal (See Figure 3a). 
By Lemma 1, a fold of the opposite diagonal corner z to the remaining extreme 


(a) (b) 


Figure 3. (a) Four Collinear Points. (6) A Quadrilateral. (c) One Interior Point 
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point d leaves w and y fixed. Each of w and y may then be folded to b and c to 
complete the sequence. 

Finally, in the remaining case, we may assume there are four non-collinear 
points, and that one interior angle of their convex hull is less than 90°. Label these 
points a, b, c, and d so that a is the vertex at the apex of a minimal interior angle. 
Insure also that the point diagonally opposite (if the convex hull is a quadrilateral) 
or interior (if the hull is a triangle) is labelled c. We then construct a square with 
vertices x, y,Z,w with a =x, such that the diagonal of the square bisects the 
interior angle at a (see Figure 3b and 3c). 

By choosing the length of the side of the square to be sufficiently large, we can 
also guarantee the following conditions: 


1. Each of the perpendicular bisectors of ad and ab intersects each of the sides 
of the square xy and xw. _ 

2. In case abcd is a quadrilateral, the perpendicular bisector of the diagonal ac 
also intersects both xy and xw. 

3. The entire convex region abcd is contained within the lower half of the lens 
determined by z, between y and w. 


Now the square xyzw may be folded to produce abcd as follows: 

First, fold z to the point closest to the diagonal xz. 

Second, fold y to the remaining point closest to y. 

Finally, fold w to the only remaining point. 

Lemma 1 guarantees, using conditions 1, 2, and 3, that each of these folds 
leaves a, b,c, d fixed. Thus each successive fold fails to interfere with the work of 
the previous folds, and xyzw are taken to a, b,c, d by a sequence of three folds. O 


Theorem 3. Any four collinear points are the images of the vertices of a suitable square 
under, at most, two folds. 


Proof: Let the four collinear points be labelled a, b,c, d. By a suitable change in 
the labelling, we may assume a and d are the extreme points; b,c © ad; and the 
points are labelled so that |ac| < |ab| and |ac| < |bd|. That is, we assume that |ac| 
is the shortest distance between an interior point and the extreme points a and d 
(the length of |ac| is exaggerated in Figure 4 to display, more clearly, the 
relationship between other points). _ 

We introduce coordinates, as shown in Figure 4, with |ad| = 1 representing unit 
length. The idea is simple: we are to fold the point labelled q at (1,0) to c leaving 
a and d fixed. Then fold the image p’ of the point p = (0,0) to b with a second 


p = (0,0) q = C0) 


Figure 4. Four collinear points, appropriately labelled. 


1995] TRANSFORMING N1-GONS 623 


fold which leaves a, c, and d fixed as well. The first part of this is easy: fold g to c 
by folding along L, the perpendicular bisector of the line segment gc; c lies within 
the lens determined by g between a and d, and c as well as a and d remain fixed 
under this fold. 

To assist in describing the image p’ of p = (0,0) under the first fold, we note 
that the line L, through c = (k, 1) and gq = (1,0) has parametric form: (x, y) = ( 

+ t,- 1/0 — k)t) and the line L, is given by (x, y)=(k+1/2+t,5+0—k)e). 
We deduce that the y intercept for L, is k*/2, and we shall use this to help 
estimate |ap’|. The real problem here is to see that |ap’| is large enough for p’ to 
reach b in the next fold, while leaving a fixed. 

Next, we note that k = |ac| < 3; this follows from lac| < |bd| since |ac| > 4 
yields |bd| > 3 and |ad| = |ac| + |bc| + |bd| > 1. Thus k? < k/2. Also, |ap’| + 
(k?/2) + (k*/2) > lap| = lad|, since the length of the path: a > p' > (0, k?/2) 
— p is clearly longer than the direct path |ap|. We conclude that |ap’| > lad| — 
k? > |ad| — k/2. Moreover, |ad| — (k/2) > |ab| since |bd| < |ac|. Finally, then, 
lap'| > |ab|, and b is accessible from p’ by a fold which leaves a itself fixed, since 
b lies within the lens determined by p’ between a and d. This same fold leaves 
both c and d fixed as well. Thus the square is transformed, by a sequence of two 
folds, into the four given collinear points. O 


4. BOUNDS FOR n => 5. Theorems 1 and 2 lead to the natural conjecture that 
every n-point set is the image of the vertices of some regular n-gon under finitely 
many folds. We first prove this directly for n = 5. 


Theorem 4. Every five points are the images of the vertices of a a suitable regular 
pentagon under at most five folds. 


Proof: Observe that the lens determined by each vertex between the two adjacent 
vertices represents the set of points accessible from the given vertex by a fold 
which (by Lemma 1) leaves all four remaining vertices fixed. It is only necessary to 
choose a sufficiently large pentagon, to insure that the five points we wish to 
produce lie within the region common to the five lenses (see Figure 5). A total of 
five folds will suffice for the pentagon. O 


Figure 5. The Petagon 
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Theorem 5. Every six points are the images of the vertices of a regular hexagon under 
at most seven folds. 


Proof: The procedure is similar to that for the pentagon. Here the lenses deter- 
mined by each vertex between the two adjacent vertices meet in only one point. 
Choose a hexagon sufficiently large to contain all six required points within a disc 
inside a single lens, between the center and some vertex a. Fold the opposite 
vertex b to the center b’. Label one of the points nearest to the center inside the 
lens b", and label that nearest to a as a’. Now fold all four remaining vertices 
from the hexagon to the four unlabeled points by folding, first, those vertices 
formerly adjacent to b, and then those adjacent to a. (Lemma 1 guarantees that 
this will not disturb a or b’.) Finally, fold a to a’ and b’ to b”. This procedure 
requires seven folds. ia 


Lemma 3. Every regular n-gon can be transformed into a set of n distinct collinear 
points by a finite sequence of folds. At most =n folds are required when n is even, and 
at most 3(n — 1) folds are required when n is odd. 


Proof: For n even, choose a suitable line L, through the center of the n-gon so 
that folding about L, produces n distinct points. (There are infinitely many lines 
through the center, and only finitely many of them fail to have this property.) 
“Stratifying” the images in layers parallel to L,, and folding each layer to the one 
below will collapse all points to a collinear set, as shown in Figure 7a. Observe that 
for n even, the complete process requires 3n folds. For n odd, the polygon may be 
easily stratified in horizontal layers and collapsed to a collinear set by orienting the 
polygon so that one side is horizontal, as in Figure 7b. For n odd, this process 
requires 5(n — 1) folds. | O 


Qa 


Figure 6. The Hexagon 


Lemma 4. Let {a,,...,a,} be a set of n points contained in an open disc of diameter 
d. If {x,,...,x,} is a collinear set of n points with x, = a, and |x;x,;| = 2d (for 
j=it+1, i=1,2,...,n —1) then there is a sequence of (n — 1) folds which 
transforms {x,,...,X,} into {a,,...,a,}. 
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Figure 7a Figure 7b 


Proof: We proceed by induction on n. If n = 2 the statement is true, since a, lies 
in the circle with center at x, = a,, which passes through x,. If we assume the 
assertion of the Lemma to be true for n with 2 <n <k, observe that we may fold 


X1,++-,X, tO a,,...,a, with (k — 1) folds, and the last such fold takes x,,, to 
some point p at a distance at least 2d from x,. It follows that the circle of 
diameter d which contains {a,,..., a,} lies inside the circle through p with center 
at a,, and the fold along the perpendicular bisector of a,,,p takes p to a,,, 
while leaving a,,..., a, fixed. Thus k folds suffice to transform x,,...,x,4, into 
Ayy eee 5 Ap: O 
x} 
Ix, X;411> 2d 


Figure 8. Folding n-collinear points to a small set. 


Theorem 6. Every set of n points is the image of a suitable regular n-gon, under at 
most finitely many folds. For n even the number of folds does not exceed 5(3n — 2), 
and for n odd the number of folds does not exceed $(3n — 3). 


Proof: Let S be a set of n points and assume S CC, a circle of diameter d. 
Choose a regular n-gon P sufficiently large so that the minimum distance between 
X1,...,X, (the collinear points constructed from Lemma 3) is at least 2d. Locate 
the polygon in the plane so that x, coincides with one member of S. Now apply 
Lemma 4 to transform these collinear points into S$. The total number of folds 
required is then given by Lemmas 3 and 4. For n even, the number of folds does 
not exceed $n + (nm — 1) and for n odd, the number of folds does not exceed 
s(n — 1) + (n — 1). Oo 


Observe that Theorem 6 has a more general analog which may be proved easily 
using the techniques employed in Lemmas 3 and 4. If R is a set of distinct points, 
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then every set of n points is the image of a suitable R’ similar to R, under at most 
2(n — 1) folds. At most (m — 1) folds are required to produce a collinear set from 
R, and a suitably large copy R’ can be reduced to the given set by at most (n — 1) 
additional folds. 


5. AN OPEN PROBLEM. For n = 4,5, 6,7,..., it remains an intriguing problem 
to determine the least number of folds required to produce arbitrary n-point 
configurations from the vertices of a suitable n-gon. In particular, it would be nice 
to settle the question whether or not two folds suffice in general to produce four 
arbitrary points from the square. 


ACKNOWLEDGMENT. The authors wish to thank the referee for a suggested improvement to Lemma 
3 and for drawing our attention to the observation which follows Theorem 6. 
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A strict separation must be maintained between physics and mathe- 
matics. Physics must remain quite independent; it must use all its 
powers of love, respect, and reverence to find its way into nature and 
the sacred life of nature irrespective of what mathematics does. The 
latter, on the other hand, must declare itself independent of all 


externalities, take its own path of intellect, and develop in a purer 
way than it now does in working with the physical world to gain 
something from it or impose somcthing on it. 


Goethe Scientific Studies, translated by Douglas Miller. 
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Isometries of the Plane 


David A. Singer 


The purpose of this note is to develop an elementary proof of the classification of 
isometries of the Euclidean and hyperbolic planes. The proof is constructive and 
elementary; it illustrates the distinction between the two geometries by presenting 
a Euclidean construction which in the non-Euclidean case breaks down at the last 
step, revealing the horocycle rotation. The proof is developed in the framework of 
absolute geometry (geometry without the parallel postulate) as far as this can be 
pushed. 

The basic result concerning rigid motions is the following, which can be found in 
[1, p. 46] for the Euclidean plane: 


Theorem 1. Every rigid motion of the “ plane” is of one of the following types: 


i) Rotations about a fixed point P; 

ii) Translations in the direction of a line 1; 
iii) Reflection across a line 1; 
iv) Glide-reflections along a line 1. 


The first type has P as a fixed point; the others have | as an invariant line (that is, 
TC) = 1). 


Note. The quotation marks above are to remind the reader that in order for this to 
really be a theorem, one has to be sure of the meaning of the word “plane” in the 
Statement. 

Before reading the proof of this result, it is helpful to consider the effect of 
composing two isometries. The theorem implies that the composition of two 
rotations must be a rotation or a translation. (It can’t be a reflection or glide- 
reflection.) This is not intuitively obvious, although it is hard to visualize a 
counter-example. 

To prove the theorem, we need a couple of elementary facts about rigid 
motions. First of all, given three points determining a triangle, there is a unique 
rigid motion which carries these three points to a specified congruent triangle. 
(This might well be taken as the definition of congruence.) 

Given three non-collinear points A, B,C and two points A’, B’ with lengths 
[AB| = |A’B'|, there are exactly two rigid motions which carry A to A’ and B to 
B', determined by the two choices of C’ making AABC congruent to AA’B'C’. 

Finally, from these observations we see that if an isometry 7 of the plane has 
three non-collinear fixed points, it is the identity; if it has two fixed points it is 
either the identity or the reflection through the line determined by these fixed 
points. 

Now Let T be an isometry of the “plane.” (The quotation marks are to 
emphasize that I am not going to say whether this is the Euclidean plane or the 
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hyperbolic plane until absolutely forced to do so. The pedagogical advantage of 
this approach is that it dramatizes the need for precise and rigorous arguments.) 
Choose any point x, in the plane for which T(x,) # x9. Let x, = T(x,) and let 
x, = T(x). 

It may happen that x, = x,. In this case the line /’ determined by x, and x, is 
invariant under T and the midpoint A of the segment from x, to x, is a fixed 
point. Choosing any point B on the line 7 through A perpendicular to /’, it is 
immediate that / is also invariant and that T is either a reflection through / (if B 
is fixed) or rotation through 180° around A (if T(B) = B’ is on the opposite side 
of 1’). [See Figure 1.] 


Figure 1 


Note that choosing x, anywhere in the plane yields x, =x, in both of these 
cases. Thus we have classified the involutions of the plane (T’ = identity), and we 
can see that if x, =x), then the choice is no accident! 

Now we may assume that x), x,, and x, are distinct points. If they are collinear, 
determining a line /, then T must either be the glide reflection along / or the 
translation along / carrying the points x, and x, to the points x, and x,. (There 
can only be two rigid motions carrying x) to x, and x, to x,, and we have named 
two.) 

We are reduced to the expected situation, namely that x9, x,, and x, form a 
triangle. Let A be the midpoint of x)x, and B the midpoint of x,x,. Let 
x3, = T(x,) and let C be the midpoint of x,x,. We have B = T(A) and C = T(B). 
If A, B, and C are collinear (Figure 2), then inspection of the figure shows that T 
must be a glide-reflection along this line (a reflection followed by a translation 
along the reflecting line.) Conversely, if T is a glide reflection then this collinearity 
always occurs. 


Xo X2 
A — B C SX3 l 
xy 
Figure 2 


We are down to one final situation, in which the points A, B and C are 
non-collinear. To show that T is a rotation, we must find the fixed point. But it is 
clear where to look for it! Construct the perpendicular / to x)x, through A and 
the perpendicular /' to x,x, through B. The map T takes / to /'. If O is the point 
of intersection of / and /’, then O must be a fixed point and T a rotation around 
O. (See Figure 3.) 
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Figure 3 


Up until this very last sentence, the reasoning above has been equally valid in 
the Euclidean case and the hyperbolic case. However, the existence of O requires the 
parallel postulate. Indeed, this illustrates the danger of arguing by pictures. The 
flaw in the argument lies in the assumption that Figure 3 must look the way it 
does. 

Before formulating the correct theorem for the hyperbolic case, it will be 
helpful to review the behavior of lines. Recall that in hyperbolic geometry there 
are three ways a pair of lines / and /’ can look. They can intersect at some point P. 
They can be “parallel,” diverging from each other in one direction while approach- 
ing each other asymptotically in the other direction. Such lines are said to meet at 
an “ideal point,” which can be thought of as an “endpoint at infinity” of the line. 
When the hyperbolic plane is modeled by the Poincaré disc, these ideal points 
form the boundary circle. 

Finally, they can be “hyperparallel,” diverging from each other in both direc- 
tions. Such lines have the property that they have a common perpendicular, which 
represents the shortest ‘line segment between the two lines. Given a point A not 
on a line /, there are infinitely many lines through A not meeting /; of these, two 
are parallel and the others are hyperparallel. (See [2], pp. 179-187, [3], p. 156, for 
information about parallels and common perpendiculars. ) 


Theorem 2. Every rigid motion of the hyperbolic plane is one of the following types: 


i) Rotations about a fixed point P: 
ii) Translations in the direction of a line 1; 
iii) Reflection across a line 1; 
iv) Glide-reflections along a line 1; 
v) Horocycle rotations. 


The first type has P as a fixed point; the second and third have | as an invariant line 
and have two ideal points fixed (the “endpoints” of 1). The fifth kind has one ideal 
point fixed; it has no invariant lines. 


To prove this result by completing the earlier argument, we need to do some 
work with hyperparallel lines. 

Consider the rays of / and /’ on the side of the angle < Ax,B. Let Z, be the 
point on / at the foot of the perpendicular from B to /. Imagine a moving point Z 
on /, sliding along (away from A) starting at Z,. (See Figure 4.) As Z moves out 
along / the perpendicular through Z meets /’ at a moving point W. If Z, is farther 
out than Z,, then angle < Z,W,B is smaller than < Z,W,B (since the sum of the 
angles in the quadrilateral Z,Z,W,W, is less than four right angles). 
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Figure 4 


It is possible that for some choice of Z the perpendicular m through Z is also 
perpendicular to /' at W. In that case, a symmetry argument shows that |AZ| = 
|BW|. (I leave this as an exercise.) Therefore, translation along m will carry / to l' 
and thus T is that translation. (This reveals the remarkable fact that in the 
hyperbolic case, a translation has exactly one invariant line; it is characterized as 
the unique common perpendicular to the lines / and /’.) 

If, however, the lines / and /’ are parallel, then no such common perpendicular 
exists. Instead, the two lines approach each other, “meeting” at an ideal point. 
Since T carries / to /', this ideal point is “fixed” by the map. More simply, if /” is a 
common parallel to / and /', then T(/"”) must also be a common parallel, so that 
the family of (one-sided) parallels to / are invariant under T. T is therefore a 
“rotation” about the ideal point. (See [1], p. 269, where “parallel displacement” is 
discussed.) It is an elementary (but interesting!) exercise to show that no line is 
fixed under T in this case. 

This theorem may be used as a jumping-off point for further exploration. For 
example, similar arguments will show that in the (double) elliptic case, T is always 
a rotation. What are the analogous theorems in the 3-dimensional case? 

Another direction in which fruitful exploration is available is through the notion 
of an orbit. If T is an isometry of the plane and x = x, is any point, then the orbit 
of x is the sequence of points x), x;, x», x3... defined by x,,, = T(x,). In the 
Euclidean plane, such an orbit consists either of one point, two distinct points, 
finitely many points equally spaced on a circle, or infinitely many points lying on a 
line or circle or on two lines. What can be said about the curves in hyperbolic 
space (or in higher dimensions) on which the orbits lie? 
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NOTES 


Edited by: John Duncan 


One More Construction Which 
Is Impossible 


V. A. Geyler 


The purpose of this note is to present a new “elementary” geometrical problem 
which cannot be solved if one is required to use a straightedge and compass only. 
At first glance it seems very surprising that a problem like this needs more 
advanced tools. 


Problem. To construct a secant of a given circle which divides the area of the circle 
into two commensurate but unequal parts. 


That is, let a circle of unit radius and a positive rational number qg # 1 be given 
(see Figure 1). We want to construct a secant AB such that the ratio of the areas 
of the two complementary segments APB and AP'B is equal to q. Recall that a 
(circular) segment is the region bounded by an arc and the subtending chord. 


A 


Figure 1 


We will show that this construction is impossible if a straightedge and compass 
are the only tools which we may use. We precede the proof with a couple of 
definitions and facts which will be needed. A complex number is algebraic if it is a 
zero of a polynomial with integer coefficients. Numbers which are not algebraic are 
called transcendental. The set of all algebraic numbers is a field. This implies, in 
particular, that for any algebraic number a and any natural number k the number 
a* is also algebraic. Recall also that each constructible number, i.e., a number 
which can be constructed using a straightedge and compass only, is algebraic. A 
crucial result for us is the following theorem due to Lindemann. If x # 0 is an 
algebraic number, then e* is a transcendental number. We refer to [2, 3] for 
unexplained terminology and details. 
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Proof of our claim: We use the notation introduced in Figure 1. Assume, contrary 
to what we claim, that there exists a rational number n/m, where m, n are positive 
integers with 1<n<m, for which we can construct, with straightedge and 
compass, a secant AB of the circle which divides the area in the ratio n: m. Let us 
denote by A, and A, the areas of the segments APB and AP’'B respectively, and 
by a the central angle AOB which is assumed to be less than 7. The constructibil- 
ity of the secant AB implies immediately that the numbers sin(a/2) and cos(a/2) 
are constructible, and therefore sin a = 2sin(a/2)cos(a/2) is algebraic (even 
constructible). 

On the other hand, we have that 2A, = a — sina and2A, =27—-—a+sSina, 
and consequently 


(-A,/A,=(a-sina)/(Q2a7-—a+t+sina) =n/m. 


This implies that Gm + n)a — 277n = (m + n)sin a. Thus, the number x = (m + 
n)a — 27rn is algebraic, and clearly x # 0 as sin a > 0. By Lindemann’s theorem 
eX = e(m+n)a is a transcendental number, and hence e’* is a transcendental 
number, too. Since the numbers cos a and sin a@ are either both algebraic or both 
transcendental, and since e’* = cos a + isin a, we can conclude that sin a is a 


transcendental number, a contradiction. 


Remark 1. Lindemann’s theorem is a very deep result, which seems to be much 
deeper than the problem it is applied to. Therefore it would be interesting to find 
out if there exists a more elementary proof of the fact proved above without using 
Lindemann’s theorem. 


Remark 2. Here we quickly discuss a few easy (and possibly known) results 
following from Lindemann’s theorem and related to the ideas above. For brevity 
we use the word “construct” to mean “construct with ruler and compass.” 

First of all, we cannot construct a nonzero angle a whose radian measure is 
algebraic. (Indeed, a central angle @ so constructed on the unit circle determines a 
chord with length 2sin(a@/2) and the latter number is algebraic. However, as we 
showed above [a@ algebraic] => [2 sin(a/2) transcendental], a contradiction.) 

It follows that we cannot construct a sector (area = a/2) or a segment (area = 
(a — sin a)/2 = [transcendental] — [algebraic]) of the unit circle whose area is a 
nonzero algebraic number. In other words, we can state the well known impossibil- 
ity of squaring the circle in a more general way, by saying that it is impossible to 
Square a nonzero constructible sector or segment of the unit circle. 

Finally, following a comment by the referee, we would like to mention a very 
interesting theorem of Newton which seems quite relevant. Each smooth oval, in 
particular a circle, is algebraically non-integrable. This means that there is no non-zero 
polynomial P satisfying P(S, a, b,c) = 0, where S is the area of the segment cut off by 
the straight line ax + by = c. We refer to [1, pp. 84-85] for details regarding this 
fact. 

In conclusion the author would like to thank the referee for his suggestions, and 
also M. Frantz and J. Sarkar for their help in preparing the manuscript. 
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An Inductive Proof of a Mixed 
Arithmetic-Geometric Mean Inequality 


Takashi Matsuda 


Let x,,...,x, be positive real numbers. The arithmetic-geometric mean inequality 
is 
X, tc +x, on 
y ny = xX eee xX 
n n 1 


and a mixed arithmetic-geometric mean inequality is 
Zpter tz, on 


a, = < Wy, = Wy (2) 


Equality holds if and only if x, = -:- =x, 

A proof of (2) was given by Kiran Kedlaya [1]; it was combinatorial. In this note 
we propose to give an inductive proof of (2) that uses a little analysis. The present 
proof is suitable for the advanced undergraduate student. 

We shall first give a proof of the following lemma. 


=2, (1) 


Lemma. Let 
S = {(Uy,- 06) Umey)i Uy + see FU, 44 = (m + l)a,u, > 0O,...,U, 44 > 0}, 


where a is a positive constant and let 


2 umth \ ru, to tu, \™ 
(Mis ++- Umi) = Ha [uy +o + ym ” 
(m = 1,2,...) 
Then 
U,,...,U >a™*! — for every (u,,...,U ES. 3 
1 m+1 1 m+1 
Equality holds if and only if u, = +++ =Upns1 
Proof: If m = 1, then we have 
u2 + (2a —u,) 
f(u,, U2) = ( 5 ) >a’. 
If m > 2, then let 
(scan) ey + ag MSE ty 
Ese Ums +] 2 2 ul m 
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2 


. U4 
Since u,; + — > 2u,, we have 
uj 


f(Uy,--+>Ums1) > 8(Uy,--- Umar): (4) 
Let p be a positive integer and let 


sp = {Cusseta} Uy ro TU yy 


a a 
= + 1l)a,u,; => ——.,...,U,4, = ; 
(m+ 1st = atte = 


Since g is continuous and S,, is a compact set, g has a maximum and a minimum 
value on S,. For u, > 0, uy > 0,...,Um41 > O, let 


A( Uy, -++5Ume1) = B(Uy, +--+) Umar) - Af u, + +Uy 4, —(m + Il)jal. 
We proceed by computing | 
oh oh oh 


yeees and — 
Ou, OU, +4 On 


and setting each of these expressions equal to zero. We obtain 


oh 1 ums u, + «-- +U,, m-1 
— = Qu, +++ + [= -—A=0, (5) 
Ju, mt+i1 mn m 
oh 2 u3;\(u,++u,,\" ah 
— - - =I | +——=0, (6) 
Ju, m+) U5 m Ou, 
oh k  fug'  ubtt\(u, te tu,\" ah 
gh ka [wet bl) tua" | ah 
Ju, m+1\ uz ur, m U, 
(k =3,...,m), 
(7) 
oh Une (Uy to tu, \™ 
= “no (2) a =o, (8) 
OU +4 Un m 
oh 
From (5), (6) and (7), we have 
Ug = = Um ~ Umit 
If we set u, = *** =U,,,, =¢t, then from (5) and (8), we have u, =¢ and using 
(9), we get 
u, = Uy = “ = Uni = a. 
Thus, the system of equations 
oh oh oh 0 
du, si UC 
has one and only one solution (a,...,@) and (a,...,a) € S,. This critical point 


obtained by the Lagrange multiplier rule must give the maximum or the minimum. 
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Thus, 


g(a,...,a) =a™*! 
is the maximum value or the minimum value. 
Let u, = ++ =Uu,, =x, x some positive number. Then u,,,, = (m + Ia — mx 
and 
mx™*! 4+ [(m + 1)a — mx)" 


g(x,...,x,(m + l)a — mx) = oa] 


If a/(pt+I<x<a+tp/m(p + Ia, then (x,...,x,(m + Ia — mx) ES, 
Also, if x # a, then 


(x,...,x,(m + 1)a — mx) # (a,...,4,a). 


An easy application of differential calculus establishes the following inequality 


<x<a+——a,xr#a . 
pt+i1 m(p + 1) 
(10) 


Hence, a”** is the minimum value of g on S,. Since S, TS as p becomes 
_arbitrarily large, we have 


g(x,...,x,(m + l)a — mx) >a™", 


1 


(Uy, . 6-5 Une) 2a"*' for every (U,,.-.,Um4i) ES. 


From (4), we therefore also have 


f(Uy,. +65 Um) 2 a™*! for every (U,,...,Ums1) ES. 
It is clear that equality holds if and only if u, = --- =U,,44. 
This proves the lemma. — a 


Proof of (2): Inequality (2) is equivalent to 


h £1 ++: +Z,, ” h 
a, = [=] SY Yn = Mn (11) 
Let a = a,,,, in the lemma; then (z,,..., Z,,4,) © S. Furthermore, it is quite easy 
to see that 


Zy + +Z, | 
m 


f (215-665 Zm41) = ym 
Let m = 1. Using the lemma, we have 


AF <V2Z, =i V2 = ¥5- 


Evidently, equality holds if and only if x, = x,. Hence (11) is true for n = 2. 
Mathematical induction now readily gives the general result (11). 

Thus, assume that (11) is valid for n. Let m =n. Using the lemma and the 
induction hypothesis, we have 


n+1 ZF +2, \" n+1 
Ont SYn41] ——T—— S10 Yn = Merv (12) 
Evidently, x, = «+ =x, is equivalent to z, = --- =2z,. Hence, equality in the 
second inequality of (12) holds if and only if z, = --- =z,,. 
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Therefore, equality holds simultaneously in both inequalities of (12) if and only 
if £1 =" = en+1> 1.€., X41 = =Xn4at: 
Thus we have established the mixed arithmetic-geometric mean inequality (2). 
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The Ranks of Tournament Matrices 


T. S. Michael 


1. TOURNAMENT MATRICES. A round-robin tournament is held among n 
players. Altogether there are n(n — 1)/2 matches, and we assume that no match 
ends in a draw. The results are recorded in an n by n tournament matrix.A = [a; jJ 
as follows. Label the players with the indices 1,..., and define 


_ |1_ if player i defeats player j, 
4 0 otherwise. 
Thus the tournament matrix A satisfies 
A+A'+I=J, (1) 

where J is the n by n identity matrix and J is the n by n matrix of all I’s. 
Conversely, any matrix A of 0’s and 1’s that satisfies (1) is a tournament matrix. 
The score s; is the number of players defeated by player 7. A tournament is regular 
provided each player has the same score. For i # j the joint score s;; is the number 
of players defeated by both player i and player j. 

Because each element of A is 0 or 1, we may view A as a matrix over any field. 
In a note in this Monthly, de Caen [1] uses elementary linear algebra to establish a 
general inequality for the rank of a tournament matrix. 


Proposition (de Caen). Jf A is an n by n tournament matrix, then rank(A) > (n — 
1)/2 over any field. If A is a regular n by n tournament matrix and rank(A) = (n — 
1)/2, then the characteristic of the field divides (n — 1)/2. 


We shall prove de Caen’s inequality in a manner that allows us to characterize 
the case of equality both algebraically and combinatorially. In particular, we shall 
see that the hypothesis of regularity may be dropped in the second assertion of the 
Proposition. Here is our main result. 
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Theorem 1. Let A be an n by n tournament matrix. Then 
rank( A) > (n — 1)/2 (2) 


over any field F. Equality holds if and only if n is odd and AA’ = O. Moreover, 
equality implies that the characteristic of F divides (n — 1)/2. 


In §3 we construct tournament matrices that achieve the lower bound in (2). We 
remark that ranks of tournament matrices over finite fields may be used in an 
important combinatorial problem—the testing of block designs for isomorphism 
[2]. 

Observe that if A is a tournament matrix, then the (i,i) element of AA’ is 
equal to the score s;, while the (7, 7) element (i # j) is equal to the joint score s;,. 
This observation is a consequence of the row-by-column definition of matrix 
multiplication and the manner in which tournament matrices are defined. Thus 
from Theorem 1 we immediately obtain the following combinatorial description of 
the case of equality in (2). 


Corollary. An n by n tournament matrix A satisfies rank(A) = (n ~ 1)/2 over the 
field F if and only if n is odd and the characteristic of F divides every score and every 
joint score of the tournament corresponding to A. 


2. PROOF OF THEOREM 1. By the rank-plus-nullity theorem inequality (2) is 
equivalent to 


nullity(A) < (m+ 1)/2. (2') 


Equality holds in (2) if and only if equality holds in (2'). We shall work with (2') 
throughout our argument. 

The case n = 1 is trivial. Henceforth suppose that n > 2. Let u=(1,..., 1)’ 
denote the column vector of n 1’s, and let e; denote the ith unit coordinate vector 
(i =1,...,n). Then Ju =nu and Je; = u for i = 1,...,n. Thus over any field F 
the only possible nonzero eigenvalue of J is n, and the corresponding eigenspace 
would be <u). 

Let N, denote the nullspace of A. Suppose that v © N, 1 Nyr and v # 0. Then 
Jv =(A +A’ + Dv =v by (1). Hence v is an eigenvector of J corresponding to 
the eigenvalue 1. By the preceding comments, n = 1 in F, and v € <u). Thus the 
dimension of the subspace N, M N,r is at most 1. Hence 


nullity(.A4) + nullity(A’) <n + dim(N, N Nyr) <n +1. 


Now nullity(A) = nullity(A’), and therefore inequality (2’) is true. 

Suppose that equality holds in (2’). Of course, n must be odd. Also, the 
nullspaces N, and N,r have intersection <u) and dimension (m + 1)/2. Thus 
there are bases {u,vj,...,V,-1 2} and {u,w,,...,Wn—1/2} for N, and N,v, 
respectively, whose union spans F“. To prove that AA’ = O it suffices to prove 
that AA’ annihilates these two bases. Clearly, AA’u = 0 and AA’w, = 0 for each 
i. Also, AJ = O and hence by (1) 


AA'y, = A(J —A -—I)v, = (AJ -A(A + ]))v, = -(4 +1) Ay, = 0. 


for each i. Therefore AA’ = O. Let the characteristic of F be p. The sum of the 
diagonal elements of AA’ is the sum of the scores of the tournament, i.e., 
n(n — 1)/2. Hence AA’ =O implies that p divides n(n — 1)/2, and thus p 
divides (n — 1)/2, as n = 1in F. 
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Conversely, suppose that n is odd and AA! = O. Then the nullspace of A 
contains the column space of A’. Thus 


nullity(A) > rank(.A’) = rank(A) =n — nullity( A), 


and so nullity(A) > n/2. Also, nullity(A) < (nm + 1)/2 by (2’). Because n is odd, 
the only possibility is that nullity(4) = (n + 1)/2. a 


3. DOUBLY REGULAR TOURNAMENTS. A tournament among n players (n > 
3) is doubly regular provided the joint scores are all equal, say, to m — 1, where m 
is a positive integer. In a doubly regular tournament the set of players defeated by 
player i yields a regular tournament in which each player defeats m — 1 others 
(i = 1,...,). Thus a doubly regular tournament is regular with each score equal 
to 2m — 1 and with 4m — 1 players. Let A,, be a doubly regular 4m — 1 by 
4m — 1 tournament matrix. Then we have shown that 


Am Am = ml +(m—1)J. (3) 
Theorem 2. Let A,, be a doubly regular 4m — 1 by 4m — 1 tournament matrix, and 


let F be a field of characteristic p, where p divides m. Write n = 4m + 1. Then the n by 
n tournament matrix | 


has rank (n — 1)/2 over F. 


Proof: Over the field of rational numbers 


m(I+J) 
AAT = 


by (3). Hence AA’ = O over F. Also, n is odd. Therefore rank(A) = (n — 1)/2 by 
Theorem 1. = 


Doubly regular 4m — 1 by 4m — 1 tournament matrices exist for infinitely 
many values of m. (See §7 of [3] for a plethora of constructions.) Thus Theorem 2 
shows that equality holds infinitely often in Theorem 1. 
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On Some Applications of Fibonacci Numbers 


David L. Ranum 


The Fibonacci numbers (0, 1, 1, 2, 3, 5, 8, 13, 21, 34,55,...) as defined by the famil- 
lar recurrence 


Fy = 0, 
F, = 1, 
F.=F._,+F,_, foralli>=2 (1) 


are an important part of mathematics legend. Although they are no longer utilized 
as a population model for rabbits, the sequence often emerges in very unique and 
interesting places. In mathematics textbooks, their relationship (Eq. 2) to the 
golden ratio and its conjugate (shown as an approximation in Eq. 3) appears as an 
initial nontrivial example of proof by induction. 


F, = (®' — W") /sqrt(5) where ® = 1.61803... and V = —0.61803... (2) 
F, = @'/sqrt(5) (3) 


The computer science community also considers these values to have great 
importance. It is almost a guarantee that in the first introduction to recursion, a 
student will be subject to this very same recurrence as a recursive function to 
compute the nth Fibonacci number. However, this interest goes much further than 
a simple algorithmic example. In fact, the Fibonacci numbers and their associated 
properties have been used often in the development and analysis of data structures 
and algorithms. 


BINARY TREES AND BINARY SEARCH TREES. A binary tree is a data struc- 
ture defined on a finite set of nodes (values) that is either: 

(1) empty, or 

(2) consists of a root node with the remaining nodes divided into two disjoints 
sets (called the left and right subtrees) each of which is a binary tree. 

Figure 1 shows an example of a binary tree constructed from seven nodes. The 
root of the entire tree is the node with value ‘E.’ The node containing ‘B’ is known 
as the left child of ‘E’ and similarly the node containing “G’ is known as the right 
child of ‘E.’ Note that the left child of ‘E’ also serves as the root of the left subtree. 
from the recursive definition (likewise for the right). The node labeled ‘C’ is an 
example of a leaf node as it serves as a root for 2 empty subtrees. The height of the 
tree is calculated as the number of levels (or generations) within the tree. In Figure 
1 then, the height is said to be 4. 
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Figure 1. A binary tree. 


Figure 2. A perfectly balanced, full tree. 


Although the binary tree is restricted somewhat, it is apparent that a large 
number of tree structures are possible given a particular set of node values. At one 
extreme, Figure 2 shows a perfectly balanced tree where each node except for the 
leaves has both a left and a right child and all leaves appear at the same level. A 
tree such as this is said to be full because it contains the maximum number of 
nodes for a given height. At the other extreme, Figure 3 shows a worst case 
degenerate tree where each node has only 1 child except for the single leaf. 


Claim J. For a binary tree T of height H: (a) the minimum number of nodes is H 
and (b) the maximum number of nodes is 2” — 1. 


Proof: (a) By definition, the tree must have at least 1 node on each level. 
Therefore, in the minimum case, the tree could contain as few as H nodes and still 
satisfy this property. 

(b) In the maximum case the proof is by induction on the height of the tree, 

(base) H = 1: If T is of height 1 then there is only one node which must be the 
root node. 

(hyp) H =i: number of nodes is 2' — 1. 

(induction) H =i + 1: By definition, a tree of height i + 1 must consist of a 
root node and two subtrees each of height i. By the inductive hypothesis, the 
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Figure 3. A degenerate binary tree. 


subtrees must each have 2’ — 1 nodes so that total number is 1 + (2! — 1) + Q' — 
1) = 22') —-1 =2't! - 1. | 


One common application for any data structure is that of search. The problem 
is to look into the structure and find a selected node value. For this purpose, a 
special type of binary tree, called a binary search tree is often enlisted. A binary 
search tree is defined as a binary tree where the structural restrictions are related 
to the node values. In particular, the binary search tree is constructed such that the 
left child of any node must be less than that node, and the right child of any node 
must be greater. Which respect to the recursive definition, all the values in the left 
subtree must be less than the root, and all the values in the right subtree must be 
greater. This relationship is then required to hold for all subtrees. 

If we consider the node values in Figure 1 as having standard alphabetical 
ordering, then this binary tree is in fact a binary search tree. Likewise, Figures 2 
and 3 also satisfy the binary search tree property on the same values, albeit with 
different shapes. In contrast, Figure 4 is not a binary search tree as the value B 
appears to the left of A (to name just one problematic pair). 


© 
© © 


Y O@® © 


Figure 4. A binary tree lacking the binary search tree properties. 


The search algorithm (given below in Pascal-like pseudocode) for binary search 
trees can be written as a simple recursive procedure which is derived directly from 
the recursive definition of binary trees. 
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procedure BstSearch (Tree, Value); 
{Given a binary search Tree and a Value, search the structure for an occurrence of 
the value} 
begin 
if Tree is empty then {not found} 
otherwise 
if Tree.RootValue = Value then {found it} 
otherwise 
if Value < Tree.RootValue then 
BstSearch (Tree.LeftSubtree, Value {must be in the 
left subtree} 
otherwise BstSearch(Tree.RightSubtree, Value){must be in the 
right subtree} 
end. 


In the worst scenario, as in the degenerate tree case of Figure 3, this algorithm will 
compare against every value in the tree, in essence performing an exhaustive, 
sequential search upon the elements. However, in a best case situation such as the 
perfectly balanced case of Figure 2, the algorithm will compare against a maximum 
of only 3 values as it divides the search space in half every time it performs a 
compare. These ideas are summarized in the following claim: 


Claim 2. Let T be a binary search tree containing n node values. (a) The worst 
case performance (in terms of compares necessary) for BstSearch is proportional 
to n. (b) The best case performance for BstSearch is proportional to log,(7). 


Proof: It can be seen from the above procedure that each compare occurs on a 
new level within- the tree. Therefore, the number of compares is equal to the 
number of levels or height of the tree. Using claim 1, in the degenerate case there 
is only one value on each level and therefore with n values there will be n 
compares. In the perfectly balanced case, a tree of height H has 2” — 1 nodes and 
therefore if a tree has n nodes, the height of such a tree will be bounded by 
log,(m + 1) making the number of compares proportional to log,(7). a 


This elementary analysis of binary search tree performance leads to the conclu- 
sion that search is better in the case where the height of the tree is minimized. 
Unfortunately, it is not possible in the general case to have a tree that is perfectly 
balanced since only certain numbers of nodes can lead to such a tree. Further, 
although minimum height trees are possible for any given number of nodes, the 
construction is time consuming and is usually not cost effective. It is therefore 
necessary to come to some compromise as to the shape of a tree in terms of 
performance. 


AVL BINARY SEARCH TREES. Given any binary search tree, it may be possible 
to rearrange the nodes so that the resulting new tree is shorter in height and will 
therefore offer better performance. However, the work involved in this balancing 
process can offset the expected gain in search performance. An AVL binary search 
tree (named after Russian mathematicians G. M. Adel’son-Vel’skii and E. M. 
Landis) is one in which the height is not necessarily minimal but is kept close 
enough that good performance can still be expected. In particular, an AVL binary 
search tree is defined as a binary search tree where the heights of all left and right 


1995] COMPUTER SCIENCE SAMPLER 643 


Figure 5. A leftmost AVL binary search tree. 


subtrees differ by at most 1. Clearly, the full tree from Figure 2 can be considered 
AVL since the height difference for any left and right subtree is in fact zero. 
Figure 1 is also an AVL binary search tree. 

What is the expected performance for an AVL binary search tree? It can be 
seen that the best case for an AVL tree is still that of the perfectly balanced, full 
tree. Therefore, the number of compares is still logarithmic in the number of 
nodes in the tree. On the other hand, the worst case will arise when each subtree is 
as much “out of balance” as possible. Figure 5 shows one such case. Here the AVL 
tree is worst case in that the height of each left subtree is one greater than the 
height of its corresponding right subtree. This leftmost AVL tree represents one 
example of a tree which maintains the AVL properties and has the maximum 
height for this number‘of nodes. The analysis is not quite so obvious. 


Claim 3. The height of a leftmost AVL binary search tree is proportional to the 
log of the number of nodes in the tree. 


Proof: Let N,, be the number of nodes in a leftmost AVL binary search tree of 
height H. Since the height of the left subtree is one more than the height of the 
right subtree and both must also be AVL, the following relationship holds: 


Nu = 1 + Nuy-1 + Nu_> where N, = 1 and N, = 2. 
This recurrence appears to be very similar to the Fibonacci recurrence stated 
earlier. In fact, careful observation shows that 
Ny = Fu. — 1, H > = 1. 
By replacing the Fibonacci reference by its golden ratio approximation (Eq. 3), we 
get 
Ny, = O7*% /sqrt(5) — 1. 
Rearranging terms, taking logs (base 2) of both sides, and solving for H yields 
Ig(Ny + 1) = (H + 2)Ig(®) — (1/2)l8(5) 
or 
H = [Ig( Ny + 1) — 21g(®) + (1/2)lg(5)] /1e($) 
or 
H = 1.441g(N,,). 
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This result shows that even in worst case, AVL trees still maintain the logarithmic 
search behavior with respect to the number of nodes in the tree since as before 
search is related to the height of the tree. 


ADDITIONAL APPLICATIONS. The above analysis is interesting not only from a 
data structures and algorithms point of view but also due to the inclusion of the 
Fibonacci result. This observation made a seemingly difficult analysis quite modest 
since once the AVL recurrence was stated in terms of Fibonacci, most of the work 
was done. The usefulness of the Fibonacci numbers appears elsewhere in the 
computer science literature as well. 

Consider the alphabet consisting of the symbols a and b. A word can then be 
defined as a-sequence of elements from the underlying alphabet. Given this 
background, it is possible to define the notion of a Fibonacci Word as 


FW, =a, 
FW, = 5b, 
FW, = FW,,,FW, forall n > 2 


where FW,,, is formed by concatenating FW, to the right of FW,,,. As 
examples, FW, = ba, FW, = bab, and FW, = babba. Although these words may 
seem peculiar, it turns out that they pose a worst case situation for a number of 
fast pattern matching algorithms [1] which attempt to solve the typical text editing 
problem of trying to locate a particular sequence of characters in a text file. 
Interested readers can find additional variations on the Fibonacci theme including 
Fibonacci Heaps, Fibonacci Merge, and Fibonacci Search in many data structures 
and algorithms textbooks [2]. 
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Abstractness, sometimes hurled as a reproach at mathematics, is its 
chicf glory and its surest title to practical usefulness. It is also the 
source of such beauty as may spring from mathematics. 


—E. T. Bell 
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THE EVOLUTION OF... 
Edited by Abe Shenitzer 
Mathematics, York University, North York, Ontario M3J 1P3, Canada 


Part II. Topology and Abstract Algebra 
as Two Roads of Mathematical 


Comprehension* 
Unterrichtsblatter fir Mathematik und Naturwissenschaften 38, 177-188 (1932). (A 


lecture in the summer course of the Swiss Society of Gymnasium Teachers, given 
in Bern, in October 1931.) 


Hermann Weyl 


Note: The first part of this article appeared in 1995, in the May issue of the 
Monthly (pp. 453). What follows is a short summary of the first part and the 
concluding part of the article. 


SUMMARY OF PART I. Weyl begins by saying that 


We are not very pleased when we are forced to accept a mathematical truth by virtue of a 
complicated chain of formal conclusions and computations, which we traverse blindly, link by 
link, feeling our way by touch. We want first an overview of the aim and of the road; we want to 
understand the idea of the proof, the deeper context. 


and goes beyond this familiar notion of understanding to what he calls modes of 
understanding. These are ways of looking at mathematics as well as the branches 
of mathematics associated with them. Two such modes of understanding “have 
proved, in our time, to be especially penetrating and fruitful. The two are topology 
and abstract algebra.” 

Now the discussion begins to involve some technical matters. Weyl explains 
what is meant by purely topological investigations of continua, discusses the 
motives that have led to the development of abstract algebra, and uses “a simple 
example to show how the same issue can be looked at from a topological and from 
an abstract-algebraic viewpoint. The (not so simple) example which he considers 
from the two viewpoints is the theory of algebraic functions of a single variable. @ 


After all these general remarks I want to use two simple examples that illustrate 
the different kinds of concept building in algebra and in topology. The classical 
example of the fruitfulness of the topological method is Riemann’s theory of 


*The original German version of this article is found in vol. 3, pp. 348-358, of the four-volume 
edition of Hermann Weyl’s collected works published by Springer-Verlag in 1968. The translation is by 
Abe Shenitzer. 
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algebraic functions and their integrals. Viewed as a topological surface, a Riemann 
surface has just one characteristic, namely its connectivity number or genus p. For 
the sphere p = 0 and for the torus p = 1. How sensible it is to place topology 
ahead of function theory follows from the decisive role of the topological number 
p in function theory on a Riemann surface. I quote a few dazzling theorems: The 
number of linearly independent everywhere regular differentials on the surface is 
p. The total order (that is, the difference between the number of zeros and the 
number of poles) of a differential on the surface is 2p — 2. If we prescribe more 
than p arbitrary points on the surface, then there exists just one single-valued 
function on it that may have simple poles at these points but is otherwise regular; 
if the number of prescribed poles is exactly p, then, if the points are in general 
position, this is no longer true. The precise answer to this question is given by the 
Riemann-Roch theorem in which the Riemann surface enters only through the 
number p. If we consider all functions on the surface that are everywhere regular 
except for a single place 2 at which they have a pole, then its possible orders are 
all numbers 1,2,3,... except for certain powers of p (the Weierstrass gap theo- 
rem). It is easy to give many more such examples. The genus p permeates the 
whole theory of functions on a Riemann surface. We encounter it at every step, 
and its role is direct, without complicated computations, understandable from its 
topological meaning (provided that we include, once and for all, the Thomson- 
Dirichlet principle as a fundamental function-theoretic principle). 

The Cauchy integral theorem gives topology the first opportunity to enter 
function theory. The integral of an analytic function over a closed path is 0 only if 
the domain that contains the path and is also the domain of definition of the 
analytic function is simply connected. Let me use this example to show how one 
“topologizes” a function-theoretic state of affairs. If f(z) is analytic, then the 
integral {, f(z) dz associates with every curve a number F(y) such that 


(1) F(y¥, + ¥2) =F(m) + F(12). 


Y, + yz stands for the curve such that the beginning of y, coincides with the end 
of y,. The functional equation (+) marks the integral F(y) as an additive path 
function. Also, each point has a neighborhood such that F(y) = 0 for each closed 
path y in that neighborhood. I will call a path function with these properties a 
topological integral, or briefly, an integral. In fact, all this concept assumes is that 
there is given a continuous manifold on which one can draw curves; it is the 
topological essence of the analytic notion of an integral. Integrals can be added 
and multiplied by numbers. The topological part of the Cauchy integral theorem 
states that on a simply connected manifold every integral is homologous to 0 (not 
only in the small but in the large), that is, F(y) = 0 for every closed curve y on the 
manifold. In this we can spot the definition of “simply connected.” The function- 
theoretic part states that the integral of an analytic function is a topological 
integral in our sense of the term. The definition of the order of connectivity [that 
we are about to state] fits in here quite readily. Integrals F,, F,,..., F, on a closed 
surface are said to be linearly independent if they are not connected by a 
homology relation 


CF, +c,F,+°+ +c,F, ~ 0 


with constant coefficients c,; other than the trivial one, when all the c; vanish. The 
order of connectivity of a surface is the maximal number of linearly independent 
integrals. For a closed two-sided surface the order of connectivity / is always an 
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even number 2p, where p is the genus. From a homology between integrals we can 
go over to a homology between closed paths. The path homology 


MY, tNgY¥2 + +n,y ~ O 
states that for every integral F we have the equality 
ny F(y;) + 1) F(y2) ++ +n, F(x) = 0. 


If we go back to the topological skeleton that decomposes the surface into 
elementary pieces and replace the continuous point-chains of paths by the discrete 
chains constructed out of elementary pieces, then we obtain an expression for the 
order of connectivity h in terms of the numbers s, k and e of pieces, edges and 
vertices. The expression in question is the well-known Euler polyhedral formula 
h=k—(e+s) + 2. Conversely, if we start with the topological skeleton, then our 
reasoning yields the result that this combination h of the number of pieces, edges 
and vertices is a topological invariant, namely it has the same value for “equivalent” 
skeletons which represent the same manifold in different subdivisions. 

When it comes to application to function theory, it is possible, using the 
Thompson-Dirichlet principle, to “realize” the topological integrals as actual 
integrals of everywhere regular-analytic differentials on a Riemann surface. One 
can say that all of the constructive work is done on the topological side, and that 
the topological results are realized in a function-theoretic manner with the help of 
a universal transfer principle, namely the Dirichlet principle. This is, in a sense, 
analogous to analytic geometry, where all the constructive work is carried out in 
the realm of numbers, and then the results are geometrically “realized” with the 
help of the transfer principle lodged in the coordinate concept. 

All this is seen more perfectly in uniformization theory, which plays a central 
role in all of function theory. But at this point, I prefer to point to another 
application which is probably close to many of you. I have in mind enumerative 
geometry, which deals with the determination of the number of points of intersec- 
tion, singularities, and so on, of algebraic relational structures, which was made 
into a general, but very poorly justified, system by Schubert and Zeuthen. Here, in 
the hands of Lefschetz and v.d. Waerden, topology achieved a decisive success in 
that it led to definitions of multiplicity valid without exception, as well as to laws 
likewise valid without exception. Of two curves on a two-sided surface one can 
cross the other at a point of intersection from left to right or from right to left. 
These points of intersection must enter every setup with opposite weights +1 and 
—1. Then the total of the weights of the intersections (which can be positive or 
negative) is invariant under arbitrary continuous deformations of the curves; in 
fact, it remains unchanged if the curves are replaced by homologous curves. Hence 
it is possible to master this number through finite combinatorial means of topology 
and obtain transparent general formulas. Two algebraic curves are, actually, two 
closed Riemann surfaces embedded in a space of four real dimensions by means of 
an analytic mapping. But in algebraic geometry a point of intersection is counted 
with positive multiplicity, whereas in topology one takes into consideration the 
sense of the crossing. This being so, it is surprising that one can resolve the 
algebraic question by topological means. The explanation is that in the case of an 
analytic manifold, crossing always takes place with the same sense. If the two 
curves are represented in the x,, x,-plane in the vicinity of their point of intersec- 
tion by the functions x, = x,(s), x. =x,(s), and x, =x7{(t), x, = x(t), then the 
sense +1 with which the first curve intersects the second is given by the sign of the 
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Jacobian 

dx, dX, 

ds ds (x1, X2) 

dx* dx; 0( x,t) 

dt dt 
evaluated at the point of intersection. In the case of complex-algebraic “curves” 
this criterion always yields the value +1. Indeed, let z,, z, be complex coordinates 
in the plane and let s and ¢ be the respective complex parameters on the two 
“curves.” The real and imaginary parts of z, and z, play the role of real 
coordinates in the plane. In their place we can take z,, Z,, z,,Z,. But then the 
determinant whose sign determines the sense of the crossing is 


A( 24,22) | 
0(s,t) 


I(21,21,22,2) — A( 2, 2) | 0(Z,, Zy) _ 


0(s,5,t,t) 0(s,t) 0(5,t) 


and thus invariably positive. Note that the Hurwitz theory of correspondence 
between algebraic curves can likewise be reduced to a purely topological core. 

On the side of abstract algebra, I will emphasize just one fundamental concept, 
namely the concept of an ideal. If we use the algebraic method, then an algebraic 
manifold is given in 3-dimensional space with complex cartesian coordinates x, y, z 
by means of a number of simultaneous equations 


fi(x,y,z) =90,...,f,(% y,z) = 0. 


The f; are polynomials. In the case of a curve it is not at all true that two 


equations suffice. Not only do the polynomials f, vanish at points of the manifold 
but also every polynomial f of the form 


(* *) f=A,fy t+ +4,f, (A; are polynomials). 


Such polynomials f form an “ideal” in the ring of polynomials. Dedekind defined 
an ideal in a given ring as a system of ring elements closed under addition and 
subtraction as well as under multiplication by ring elements. This concept is not 
too broad for our purposes. The reason is that, according to the Hilbert basis 
theorem, every ideal in the polynomial ring has a finite basis; there are finitely 
many polynomials f,,..., f,, in the ideal such that every polynomial in the ideal 
can be written in the form (* *). Hence the study of algebraic manifolds reduces to 
the study of ideals. On an algebraic surface there are points and algebraic curves. 
The latter are represented by ideals that are divisors of the ideal under considera- 
tion. The fundamental theorem of M. Noether deals with ideals whose manifold of 
zeros consists of finitely many points, and makes membership of a polynomial in 
such an ideal dependent on its behavior at these points. This theorem follows 
readily from the decomposition of an ideal into prime ideals. The investigations of 
E. Noether show that the concept of an ideal, first introduced by Dedekind in the 
theory of algebraic number fields, runs through all of algebra and arithmetic like 
Ariadne’s thread. v.d. Waerden was able to justify the enumerative calculus by 
means of the algebraic resources of ideal theory. 

If one operates in an arbitrary abstract number field rather than in the 
continuum of complex numbers, then the fundamental theorem of algebra, which 
asserts that every complex polynomial in one variable can be [uniquely] decom- 
posed into linear factors, need not hold. Hence the general prescription in 
algebraic work: See if a proof makes use of the fundamental theorem or not. In 
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every algebraic theory there is a more elementary part that is independent of the 
fundamental theorem, and therefore valid in every field, and a more advanced part 
for which the fundamental theorem is indispensable. The latter part calls for the 
algebraic closure of the field. In most cases the fundamental theorem marks a 
crucial split; its use should be avoided as long as possible. To establish theorems 
that hold in an arbitrary field it is often useful to embed the given field in a larger 
field. In particular, it is possible to embed any field in an algebraically closed field. 
A well-known example is the proof of the fact that a real polynomial can be 
decomposed over the reals into linear and quadratic factors. To prove this, we 
adjoin i to the reals and thus embed the latter in the algebraically closed field of 
complex numbers. This procedure has an analogue in topology which is used in the 
study and characterization of manifolds; in the case of a surface, this analogue 
consists in the use of its covering surfaces. 

At the center of today’s interest is noncommutative algebra in which one does 
not insist on the commutativity of multiplication. Its rise is dictated by concrete 
needs of mathematics. Composition of operations is a kind of noncommutative 
operation. Here is a specific example. We consider the symmetry properties of 
functions f(x,, X»,...,X,) of a number of arguments. The latter can be subjected 
to an arbitrary permutation s. A symmetry property is expressed in one or more 
equations of the form 


ya(s)- sf = 0. 


S 


Here a(s) stands for the numerical coefficients associated with the permutation. 
These coefficients belong to a given field K. X,a(s)-s is a “symmetry operator.” 
These operators can be multiplied by numbers, added and multiplied, that is, 
applied in succession. The result of the latter operation depends on the order of 
the “factors.” Since all formal rules of computation hold for addition and multipli- 
cation of symmetry operators, they form a “noncommutative ring” (hypercomplex 
number system). The dominant role of the concept of an ideal persists in the 
noncommutative realm. In recent years, the study of groups and their representa- 
tions by linear substitutions has been almost completely absorbed by the theory of 
noncommutative rings. Our example shows how the multiplicative group of n! 
permutations s is extended to the associated ring of magnitudes L,a(s)-s that 
admit, in addition to multiplication, addition and multiplication by numbers. 
Quantum physics has given noncommutative algebra a powerful boost. 

Unfortunately, I cannot here produce an example of the art of building an 
abstract-algebraic theory. It consists in setting up the right general concepts, such 
as fields, ideals, and so on, in decomposing an assertion to be proved into steps 
(for example, and assertion “A implies B,” or A — B, may be decomposed into 
steps A > C,C ~ D,D-—B), and in the appropriate generalization of these 
partial assertions in terms of general concepts. Once the main assertion has been 
subdivided in this way and the inessential elements have been set aside, the proofs 
of the individual steps do not, as a rule, present serious difficulties. 

Whenever applicable, the topological method appears, thus far, to be more 
effective than the algebraic one. Abstract algebra has not yet produced successes 
comparable to the successes of the topological method in the hands of Riemann. 
Nor has anyone reached by an algebraic route the peak of uniformization scaled 
topologically by Klein, Poincaré and Koebe. Here are questions to be answered in 
the future. But I do not want to conceal from you the growing feeling among 
mathematicians that the fruitfulness of the abstracting method is close to exhaus- 
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tion. It is a fact that beautiful general concepts do not drop out of the sky. The 
truth is that, to begin with, there are definite concrete problems, with all their 
undivided complexity, and these must be conquered by individuals relying on brute 
force. Only then come the axiomatizers and conclude that instead of straining to 
break in the door and bloodying one’s hands one should have first constructed a 
magic key of such and such shape and then the door would have opened quietly, as 
if by itself. But they can construct the key only because the successful break- 
through enables them to study the lock front and back, from the outside and from 
the inside. Before we can generalize, formalize and axiomatize there must be 
mathematical substance. I think that the mathematical substance on which we have 
practiced formalization in the last few decades is near exhaustion and I predict 
that the next generation will face in mathematics a tough time. 


[The sole purpose of this lecture was to give the audience a feeling for the 
intellectual atmosphere in which a substantial part of modern mathematical 
research is carried out. For those who wish to penetrate more deeply I give a few 
bibliographical suggestions. The true pioneers of abstract axiomatic algebra are 
Dedekind and Kronecker. In our own time, this orientation has been decisively 
advanced by Steinitz, by E. Noether and her school, and by E. Artin. The first 
great advance in topology came in the middle of the 19th century and was due to 
Riemann’s function theory. The more recent developments are linked primarily to 
a few works of H. Poincaré devoted to analysis situs (1895-1904). I mention the 
following books: 
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1... On algebra: Steinitz, Algebraic Theory of Fields, appeared first in Crelles Journal in 1910. It was 
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H. Hasse, Higher algebra I, IT. Sammlung Goschen 1926/27. 
B. v.d. Waerden, Modern algebra I, II. Springer 1930/31. 

2. On topology: H. Weyl, The Idea of a Riemann Surface, second ed. Teubner 1923. 
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3. Volume I of F. Klein, History of Mathematics in the 19th Century, Sptinger 1926. 


Gauss once said, “Mathematics is the queen of the sciences and 
number theory the queen of mathematics.” If this is true we may add 
that the disquisitions is the Magna Charter of number theory. 


—M. Cantor 
Allgemeine Deutsche Biographie, Bd. 8. 1878 p. 435. 
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PROBLEMS AND SOLUTIONS 


Edited by: 
Richard T. Bumby, Fred Kochman and Douglas B. West 


Proposed problems should be sent to the MONTHLY PROBLEMS address given on the 
inside front cover. Please include solutions and relevant references. Three copies 
of all items ‘needed to evaluate the problem should be sent. 


Solutions of published problems should arrive at the MONTHLY PROBLEMS address 
given on the inside front cover before February 29, 1996. If possible, solutions 
should be typed with double spacing. Two copies suffice. Several solutions may 
be mailed together, but they should be on separate sheets of paper. The problem 
number and the solver’s name and mailing address should appear on each solution. 
A mailing label should be included 1f an acknowledgment is desired. 


The published solution 1s likely to be based on a solution that is complete and 
correct. Additional information, such as references to other appearances of the 
problem or its solution, 1s also welcome. 


An asterisk (*) after the number of a problem, or part of a problem, indicates that 
no solution is currently available. 


PROBLEMS 


10466. Proposed by E. Sparre Andersen & Mogens Esrom Larsen, Kgbenhavns Universitet, 
Kgbenhavn, Danmark. 


For x € Candn EN, prove the following identities between polynomials. 
n 1 _ 1 _ 9) n ° _ ° 
(a) (—4)" > (** °) (" *) = ( ") > (* r!)( ~~ J )) Forallm € N, 
20 J 2n— J n} <= 2] 2n — 2] 
with 0 < m < 2n, generalize (a) to 


“.(x+¥\ (n-1-—x an\ "Eels 7 ej x-—J 
HO OT MT) = Cn) 2 (oi Sn) (on may) 
j=0 j=—[m/2 
10467. Proposed by Joseph E. Higgins, Cadence Design Systems, Inc., San José, CA. 


It seems geometrically evident that in a normed space X, the operator p: X \ {O} > X 
defined by (x) = x / |x| would satisfy the Lipschitz inequality |@(x) — ¢(y)| < |x — y| 
whenever |x| > 1 and |y| > 1. Prove, or give a counterexample. 
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10468. Proposed by James E. Baumgartner & Benjamin J. Tilly (student), Dartmouth 
College, Hanover, NH. 


Let F be a field and let FS be the F-vector space of all functions from the nonnegative 
integers into F. What is the dimension of FX? 


10469. Proposed by Jean Anglesio, Garches, France. 


Let P be a point in the interior of the triangle ABC and let the lines AP, BP, CP meet 
the sides BC, CA, AB respectively at the points D, E, F. Let the circles on diameters BC 
and A D intersect at points a, a’; the circles on diameters C A and BE intersect at points b, b’; 
and the circles on diameters AB and C F intersect at points c, c’. Show that a, a’, b, b’, c, c’ 
lie on a circle. 


10470. Proposed by Donald E. Knuth, Stanford University, Stanford, CA. 
Call a matrix (a;;) special if its entries satisfy 
0, ifj>it+; 
aij = —1, ifj=i1+1; 
Oorl, if j <i. 


Call a special matrix minimal if its determinant is zero, but the determinant becomes nonzero 
when any element on or below the diagonal is changed from 0 to 1. For example, 


1 —-1 0 0 


0 oO -!1 0 
1 1 1 -1 
0 O 1 0 


is a minimal special matrix. 
(a) Prove that there are exactly 2”—' minimal special matrices of size n by n. 
(b) What is the largest number of zeros that a minimal special matrix can contain? 


10471. Proposed by Stephen Semmes & Richard Stong, Rice University, Houston, TX. 


Suppose V is a (possibly infinite dimensional) vector space over C and P;, Po,..., Pr 
are projections on V. For which 7 1s it the case that if P; + P2 +---+ P, = 0, then all the 
P; are zero? 


10472. Proposed by Edward Kitchen, Santa Monica, CA. 


Let Po P; P2 P3 P4 be a convex pentagon that is affinely equivalent to a regular pentagon. 
Let L; be the center of a rotation through +77 /5 radians taking P;+2 to P;—2 (all subscripts 
modulo 5). Show that P; is the center of a rotation through —3y7r/5 radians taking L;— to 
L j41- 


NOTES 


(10471) A projection on V is a linear function P : V — V such that P* = P. (10472) In 
chapter IX of A Survey of Modern Algebra by Birkhoff and MacLane, the affine group ina 
vector space is constructed from translations and nonsingular linear transformations. If one 
figure can be taken to another by an element of this group, the figures are said to be affinely 
equivalent. 
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SOLUTIONS 


The Inverse of a Block Matrix 


10186[ 1992, 60]. Proposed by Lawrence A. Harris, University of Kentucky, Lexington, KY. 


Suppose A, B, and C are matrices of size m by n, m by m, and n by m, respectively, 
and suppose that CA is equal to the n by n identity matrix. Give a necessary and sufficient 
condition for the block matrix 

A B 
m=(0 ¢) 


to be invertible and find an expression for M—! when this condition holds. 


Solution by University of Wyoming Problem Circle, University-of Wyoming, Laramie, 
WY. Suppose that A, B and C satisfy the conditions of the problem. Define the m by m 
matrices P := AC and Q := I — AC. Then the following statements are equivalent: 

(1) M is invertible; 

(2) N(QB) NN(C) = {0}; 

(3) the m by m matrix P + QBQ is invertible. 
(In (2), N(X) denotes the nullspace of the matrix X.) When condition (3) holds, introduce 
N := (P + QBQ)"'Qand V := A — NBA. Then one has 


-+_ (C—-CBN —CBV 
uo (Cm CBN CRY, 


The following useful identities follow from CA = /: CP = C, CQ =0, PA =A, 
QA=0,P+Q=1,P?=P,Q*=Q, PQ=0=QP. 


Proof that (1) => (2). Let y € N(QB) N'N(C). Then (~S?”) € N(M) since 


A B —CBy\ _( A(-CBy)+ By\ _ [{ QBy 
0 C y 7 Cy ~~ \ Cy JC 


Assuming (1), M is invertible, so N(M) = {0}, giving y = 0. 

Proof that (2) > (3). Let y € N(P + QBQ). Then 0 = C(P + QBQ)y = CPy+ 
COQOBQy = CPy = Cy. Hence Py = ACy = O and Qy..= y.: Consequently, 0 = 
(P + OBQ)y = OBy. Thus, N(P + OBQ) C N(QB)NN(C). Assuming (2), this is {0} 
and P + QBQ is one-to-one, hence invertible. 

Proof that (3) = (1). Assuming (3), we have the matrices N and V introduced above. 
Since (P + QBQ)P P? = P, we have (P + QBQ)~!P = P. Thus NBO = 
(P + OBQ)'OBQ = (P + QBQ)'((P+QBQ)-P) =I1-—P = @Q. Now, 
NB+VC=NB+AC—-NBAC=NB(U—AC)+AC=NBQ+4+P=Q0+P2=I. 
Also NA = 0. To complete the proof, we verify the formula for M~!. 


C—CBN —CBV A B\ ({(CA—CBNA CB—CBNB—CBVC 
N V 0 C)} NA NB+VC 


_ (I CBU-NB-VC) 
~\0 NB+VC 
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Editorial comment. Several solvers noted that the condition CA = / implies thatn < m. 
The proposer’s solution verified directly that one can replace the given B by the identity 
matrix to obtain an invertible matrix M1, and M is invertible if and only if M, 'M is. The 
latter matrix is a block upper triangular matrix with diagonal blocks of J and P + QB and 
thus is invertible (with the inverse given by a standard formula) if and only if P + QB 
is invertible. While the selected solution uses finite dimensionality, since invertibility is 
deduced from having a trivial nullspace, the proposer’s approach may be easily extended to 
infinite dimensional spaces. Indeed, the problem was inspired by research at this level of 
generality (see Lawrence A. Harris, “Linear fractional transformations of circular domains 
in operator spaces”, Indiana Univ. Math. J. 41 (1992), 125-147). 


Solved also by Y. Ikeda , O. Krafft (Germany), J. H. Lindsey II, A. Nijenhuis, J. H. Steelman, O. Wyler, Westmont 
College Problem Solving Group, and the proposer. Seven incorrect or incomplete solutions were received. 


A Semi-unfriendly Identity 


10206[1992, 266]. Proposed by David M. Bloom, Brooklyn College of CUNY, Brooklyn, 
NY. 


If m and k are positive integers, prove that 
> r m =r (2%) m—k-+ [37/2] 
—\k—-rJ\r Mk i j 


Solution by Robin J. Chapman, University of Exeter, Exeter, U. K.. We analyze the 
generating functions of the two sides of the proposed identity as formal power series. For 
the left side: 


->( r x" yk 
atk er) (xy 
ae y(t y)" 

(1 —x)rt! 


1 xy(+y)]’ 
7 1 y| Lx 


1 
~ (l-x)—xy(1+y) 
1 
~ T—x(lt+y4+y?) 
For the right side: 
R(x, y) = 3 rane — i am k 


m,k,J 
a (xy) 
7 = LA, x Li/2] di —- x)Jtl 


tts) 


x L/2] (1 —- x)it1 
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- E tay) Oy) | d+xy) @y)t* 


xi(1 _ x)2i+1 xi _ x)2it2 


_1l-x+xy (1+ xy)xy? 
— (l- x)? a ere (1 — x)? ] 


_ 1-—-x+xy 

~ (l—x)? — (14+ xy)xy? 

_ 1-—-x+xy 

= [aoe peta ayP a xdys 
1 


1—x(+y+y?) 
This shows L(x, y) = R(x, y), proving the proposed identity. 


Editorial comment. The technique in this solution is called the Snake Oil method in H. S. 
Wilf, Generatingfunctionology, Academic Press, 1990. Other solvers cited D. Zeilberger, 
“The method of creative telescoping”, J. Symbolic Computation 11 (1991), 195-204 for 
algorithms showing that both sides of the given identity satisfy the recurrence 


—3(m + 1)(m + 2)S(m, k) + (m + 2)(—3k + 7m + 9)S(m + 1, k) 
+(k —2m — 3)(—k + 2m + 4)S(m + 2,k) = 0. 


In one solution, the sum on the right was split into terms with odd j and terms with even 
j, and recurrences found for these parts separately. This gave a higher order recurrence 
for this expression, but the proof could still be completed by examining a finite number of 
values of m. The proposer observed that expression on the left is the number of k element 
subsets of {1,2,...,m-+k — 1} that contain no three consecutive integers. None of the 
proofs found a combinatorial identification of the expression on the right with this quantity. 
Another aspect of these semi-unfriendly subsets appeared in problem 10343 [93, 874]. 


Solved also by S. B. Ekhad , I. Nemes (Austria), and J. H. Steelman. 


Fields Closed under nth Roots 
10274[1993, 75]. Proposed by Robert E. Byerly, Texas Tech University, Lubbock, TX. 


For odd integers n, let E, be the smallest subfield of the real numbers closed under the 
function x > 2/x. 

(a) If f(x) € Z[x] is irreducible in Z[x], show that f(x) has at most one root in £3. 

(b) Are there any such fields FE, and Z-irreducible polynomials f(x) for which f(x) has 
more than one root in E,,? 


Solution by Burt Fein and Robby Robson, Oregon State University, Corvallis, OR. 
Let n be an odd number and let f(x) € Z[x] be irreducible. We show that f(x) cannot 
have two distinct roots in E,, by proving two lemmas: 


Lemma 1. There are no non-trivial monomorphisms of the field E.,, into itself. 


Lemma 2. Let K C E,, be a subfield and suppose thato: K — Ey, 1s a monomorphism. 
Then o extends to a monomorphism defined for all of Ey. 


These two lemmas imply the result, for if @ and # are roots of f(x) in Ey, then the 
monomorphism 0: Q(a) — E, having o(@) = B extends to all of E, by Lemma 2, and 
this must be the identity by Lemma 1. Hence a = 8. 

Proof of Lemma 1. Suppose o: E, — Ey is a non-trivial monomorphism. Let L be the 
subfield of FE, fixed by o. Since o is non-trivial, L is a proper subfield. By the definition 
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of E,, there is an x € L such that 2%/x € E, — L. Since a(2/x) 4 4/x, En contains two 
real nth roots of x, which is impossible. 


Proof of Lemma 2. After applying Zorn’s Lemma to pairs (L, @) consisting of subfields L 
of E,, containing K and monomorphisms ¢@:L — E, extending o, we may assume that 
(K,o) is a maximal pair. Our goal is to show that K = E,. 

If not, there is some a € K with %/a ¢ K. The polynomial x” — a has only one real 
root and has odd degree. Thus, if we consider its factorization in K[x] into irreducible 
polynomials 


x" —a= fi (x) fo(x)- ++ fn), 


exactly one of the f; has odd degree. Assume f; has odd degree, and hence that %/a is a 
root of f|(x). Applying o to the factorization of x” — a, we find that the element */o (a), 
which is in E,, because of the way E£,, 1s defined, is the unique real root of 


x" —o(a) = fp (x) fy (&) +++ fin @)- 


Since only f; has odd degree, </o (a) is a root of f7 (x). It is a well-known result (see, for 
example, Theorem 1.8 in T. W. Hungerford, Algebra, Springer-Verlag, 1974, p. 261) that 
this condition is sufficient for o to be extendable to a monomorphism from K ( 2/a) into E, 
sending %/a to </o(a). This contradicts the maximality of the pair (K, o) and establishes 
the lemma. 


Solved also by S. Ott, F. Richman, and the proposer. 


An Unsettled Inequality 
10337[1993, 798]. Proposed by Horst Alzer, Waldbrél, Germany- 


Let n > 1 be an integer. Let x1,..., xy be real numbers with x; € (0, 1/2]. Consider 
the statement 


Xi ia %F 
[ls SS (Fn) 
i=p + * int (1 — xj) 


(a) Prove F,, for n < 3. 
(b) Show that F,, is false for n > 6. 
(c) What about F4 and F5? 


Solution by Michael Vowe, Therwil, Switzerland. 
Part (a): 
F; is trivial. 
F> is equivalent to (1 — x1 — x2)(41 — x2)* > 0, and thus true. 
F3 can be rewritten as 


1 ? 3 
sdb i — Xi41) So ( 1 — x; ) (1 — xi41 —Xi42) > 0 
i=l i=l 
(where x4 = x; and x5 = 2) and thus true. 
Part (b): Put xj = x2 =... = Xn_1 = ¥ and x, = . Then F,, is false if 


1 (m—1)2-" + (0.2)" 
—_ > —_______., 
4 (n—1)2-" + (0.8)" 
This is equivalent to 4" > 3(n — 1)(2.5)” + 4, which holds for n > 6. 


Editorial comment. Thomas L. McCoy investigated general properties of the statements 
F,, using expressions in terms of elementary symmetric polynomials and power sums in 
the xj. These studies allowed easy discovery of the equivalent form of F3 given in the 
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selected solution. There were some applications to F4 and Fs, but the truth or falsity of 
these statements remains unsettled. The proposer indicated that computer experiments by 
F. Bullock failed to locate any counterexamples to these statements. 


Solved also by T. L. McCoy, H.-J. Seiffert (Germany), GCHQ Problem Solving Group (U. K.), and the proposer. 
One incorrect solution was received. 


Collaborating editors: David F Appleyard, Paul T. Bateman, Duane M. Broline, Barry 
W. Brunson, Frank S. Cater, Gulbank D. Chakerian, Underwood Dudley, Gerald A. Edgar, 
Michael A. Filaseta, Ira M. Gessel, Richard A. Gibbs, Jerrold R. Griggs, Douglas A. Hensley, 
John R. Isbell, Mourad E. H. Ismail, Murray Klamkin, Daniel J. Kleitman, Frederick W. 
Luttmann, Frank B. Miles, Richard Pfiefer, Stephen L. Portnoy, J. O. Shallit, John Henry 
Steelman, Keniieth B. Stolarsky, David E. Tepper, Douglas B. Tyler, Daniel Ullman, and 
William E. Watkins. 


THE COVER 


David Fowler would like to hear from you if you can recognize the function 
which is plotted on the cover, or if you have ever seen such a plot before. 
The answer will be in an article by him in the January 1996 issue. Contact 
him before January with your answers, and he will report on the response. 
So far, he has shown it to hundreds of people; nobody has seen the plot 
before, and only two people have recognized the underlying function without 


considerable help. Bits of the function, he adds, have been known to 
mathematicians for more than 900 years. 


David Fowler 

. Mathematics Institute 
University of Warwick 
Coventry CV4 7AL 
ENGLAND 

dhf@maths. warwick. ac. uk 
Fax +44-1203-523548 
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The Fifty-Fifth William Lowell Putnam 
Mathematical Competition 


Leonard F. Klosinski, Gerald L. Alexanderson 
and Loren C. Larson 


The following results of the fifty-fifth William Lowell Putnam Mathematical 
Competition, held on December 3, 1994, have been determined in accordance with 
the governing regulations. This annual contest is supported by the William Lowell 
Putnam Prize Fund for the Promotion of Scholarship, left by Mrs. Putnam in 
memory of her husband, and is held under the auspices of the Mathematical 
Association of America. 

The first prize, $7,500, was awarded to the Department of Mathematics at 
Harvard University. The members of the winning team were: Kiran S. Kedlaya, 
Lenhard L. Ng, and Dylan P. Thurston; each was awarded a prize of $500. 

The second prize, $5,000, was awarded to the Department of Mathematics at 
Cornell University. The members of the winning team were Jeremy L. Bem, 
Robert D. Kleinberg, and Mark Krosky; each was awarded a prize of $400. 

The third prize, $3,000, was awarded to the Department of Mathematics at the 
Massachusetts Institute of Technology. The members of the winning team were 
Henry L. Cohn, Adam W. Meyerson, and Thomas A. Weston; each was awarded a 
prize of $300. 

The fourth prize, $2,000, was awarded to the Department of Mathematics at 
Princeton University. The members of the winning team were William R. Mann, 
Joél E. Rosenberg, and Michail Sunitsky; each was awarded a prize of $200. 

The fifth prize, $1,000, was awarded to the Department of Mathematics of the 
University of Waterloo. The members of the winning team were Ian A. Goldberg, 
Peter L. Milley, and Kevin Purbhoo; each was awarded a prize of $100. 

The five highest ranking individual contestants, in alphabetical order, were — 
Jeremy L. Bem, Cornell University; J. P. Grossman, University of Toronto; Kiran 
S. Kedlaya, Harvard University; William R. Mann, Princeton University; and 
Lenhard L. Ng, Harvard University. Each of these was designated a Putnam 
Fellow by the Mathematical Association of America and awarded a prize of $1,000, 
by the Putnam Prize Fund. 

The next five highest ranking contestants, in alphabetical order, were 
Soundararajan Kannan, University of Michigan, Ann Arbor; David L. Savitt, 
University of British Columbia; Daniel K. Schepler, Washington University, 
St. Louis; Noam M. Shazeer, Duke University; and Hong Zhou, Harvard Unive- 
rsity; each was awarded a prize of $500. 

The next six highest ranking contestants, in alphabetical order, were Alexandru 
D. Ionescu, Massachusetts Institute of Technology; Robert D. Kleinberg, Cornell 
University; Jacob A. Rasmussen, Princeton University; Andrew H. Schultz, Johns 
Hopkins University; Dylan P. Thurston, Harvard University; and Zhaohui Zhang, 
Yale University; each was awarded a prize of $250. 
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The next nine highest ranking contestants, in alphabetical order, were Henry L. 
Cohn, Massachusetts Institute of Technology; Ian A. Goldberg, University of 
Waterloo; Adam Kalai, Harvard University; Serban M. Nacu, Harvard University; 
Joel E. Rosenberg, Princeton University; Mikhail V. Shubov, Texas Tech Univer- 
sity; Jade P. Vinson, Washington University, St. Louis; Stephen S. Wang, Harvard 
University; and Jonathan L. Weinstein, Harvard University. Each was awarded a 
prize of $100. 

The following teams, named in alphabetical order, received honorable mention: 
University of Nebraska, Lincoln, with team members Scott Annin, Igor V. 
Pavlovsky, and Eric M. Smith; New York University, with team members Igor 
Berger, Yevgeniy Dodis, and Mikhail Kogan; University of Toronto, with team 
members J. P. Grossman, Edward Leung, and Naoki Sato; Washington University, 
St. Louis, with team members Ben Gum, Daniel K. Schepler, and Jade P. Vinson; 
and Yale University, with team members Gautam Chinta, Matthew Frank, and 
Zhaohui Zhang. 

Honorable mention was achieved by the following thirty individuals named in 
alphabetical order: Jared E. Anderson, University of Victoria; Federico Ardila, 
Massachusetts Institute of Technology; Bradley S. Bart, University of Waterloo; 
Ruth A. Britto-Pacumio, Massachusetts Institute of Technology; Robert H. Cheng, 
University of British Columbia; Yevgeniy Dodis, New York University; Ron D. 
Dror, Rice University; Alex Heneveld, Princeton University; Randy W. Ho, Uni- 
versity of Arizona; Jason A. Howald, Miami University; Sergey M. loffe, Mas- 
sachusetts Institute of Technology; Dean W. Jens, University of Chicago; Joanna 
L. Karczmarek, Queen’s University; Mikhail Kogan, New York University; Botond 
Koszegi, Harvard University; Mark Krosky, Cornell University; Daniel T. Martin, 
Carleton College; Olexei Ivanovich Motrunich, University of Missouri, Columbia; 
Akira Negi, University of North Carolina, Chapel Hill; An T. Nguyen, University 
of Texas, Austin; Royce Y. Peng, Harvard University; Kevin Purbhoo, University 
of Waterloo; Lawrence P. Roberts, Washington University, St. Louis; NNaoki Sato, 
University of Toronto; Sam Spencer, Rice University; Jason M. Starr, University of 
California, Berkeley; Mark A. Van Raamsdonk, University of British Columbia; 
David R. Wasserman, University of California, San Diego; Thomas A. Weston, 
Massachusetts Institute of Technology; and Jeffrey S. Willson, University of 
Chicago. 

The other individuals who achieved ranks among the top 107, in alphabetical 
order of their schools, were: Brown University, Andrew Brecher; California Insti- 
tute of Technology, Wei-Hwa Huang, Roman Muchnik; California Polytechnic 
State University, San Luis Obispo, Robert B. Mathews; University of California, 
Santa Barbara, Aaron S. Cohen; Carleton College, Curtis Z. Mitchell; Case 
Western Reserve University, Neil A. Rubin; Colgate University, Jean-Francois R. 
Lafont; Dartmouth College, Yuan Shen; Duke University, Robert R. Schneck; 
Harvard University, Manjul Bhargava, Dean R. Chung, Joe B. Fendel, Sergey V. 
Levin, Paul Li, Harrison K. Tsai, Jifi J. L. Vani¢ek; Harvey Mudd College, Aaron 
F. Archer, Kan Yasuda; University - of Illinois, Champaign-Urbana, Ivan 
Auramovic, Kwong Shing Lin; Massachusetts Institute of Technology, Adam W. 
Meyerson, Michael B. Schulz, Michael R. Tehranchi, Aleksey Zinger; McGill 
University, Jacob Eliosoff; University of Nebraska, Lincoln, Eric M. Smith; New 
York University, Igor Berger; University of North Carolina, Chapel Hill, Paul E. 
Rube; Northwestern University, Carol R. James; Princeton University, Paul J. 
Ellis, Michael J. Goldberg, Mark W. Lucianovic; Queen’s University, Peter 
Gregory Zion; Reed College, Gerald D. Larson; Rice University, Ashley M. 
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Reiter; University of Saskatchewan, Trevor N. Green; University of the South, 
Qingshan Luo; Stanford University, Robert G. Au, Heyning A. Cheng, Loren L. 
Looger; Suffolk University, Anna V. Petrovskaya; Vanderbilt University, Jason D. 
Hughes; Washington University, St. Louis, Ian F. Pulizzotto, Erik N. Vee; Univer- 
sity of Waterloo, Jason P. Bell, Jie J. Lou, Peter L. Milley, Lousindi R. Sabourin; 
Williams College, Jason R. Schweinsberg, Edward W. Welsh; and Yale University, 
Matthew Frank. 

The Elizabeth Lowell Putnam Prize, named for the wife of William Lowell 
Putnam and to be “awarded periodically to a woman whose performance on the 
Competition has been deemed particularly meritorious,” is awarded this year to 
Ruth A. Britto-Pacumio of the Massachusetts Intitute of Technology. The winner 
is awarded a prize of $500. 

There were 2,314 individual contestants from the 410 colleges and universities 
in Canada and the United States in the competition of December 3, 1994. Teams 
were entered by 284 institutions. The Questions Committee for the fifty-fifth 
competition consisted of Eugene M. Luks, University of Oregon, chair; Fan 
Chung, Bellcore; and Mark I. Krusemeyer, Carleton College; they composed the 
problems listed below and were most prominent among those suggesting solutions. 


PROBLEMS 


Problem A-1. Suppose that a sequence a,,a,,a;,... Satisfies 0 <a, <a,, + 45,4, 
for all n = 1. Prove that the series L,_,a,, diverges. 


Problem A-2. Let A be the area of the region in the first quadrant bounded by the 
line y = 3x, the x-axis, and the ellipse $x? + y? = 1. Find the positive number m 
such that A is equal to the area of the region in the first quadrant bounded by the 
line y = mx, the y-axis, and the ellipse $x* + y? = 1. 


Problem A-3. Show that if the points of an isosceles right triangle of side length 1 
are each colored with one of four colors, then there must be two points of the 
same color which are at least a distance 2 — ¥2 apart. 


Problem A-4. Let A and B be 2 X 2 matrices with integer entries such that A, 
A+B, A+2B, A+ 3B, and A + 4B are all invertible matrices whose inverses ° 
have integer entries. Show that A+ 5B is invertible and that its inverse has 
integer entries. 


Problem A-5. Let (7,),59 be a sequence of positive real numbers such that 


lim, ../, = 0. Let S be the set of numbers representable as a sum 
Kr, to +e; 


i904 ? 


with i, <i, < +++ <ijg9,. Show that every nonempty interval (a,b) contains a 
nonempty subinterval (c, d) that does not intersect S. 


Problem A-6. Let f,, f,,...,; fig be bijections of the set of integers such that for 
each integer n, there is some composition f; ° f;,°°::°f;, of these functions 
(allowing repetitions) which maps 0 to n. Consider ihe set of 1024 functions 


F= (fie fhom fen), 
e, = Oorlfor1 <i < 10.(f, is the identity function and f;' = f,.) Show that if A 
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is any nonempty finite set of integers, then at most 512 of the functions in Y map 
A to itself. 


Problem B-1. Find all positive integers that are within 250 of exactly 15 perfect 
squares. 


Problem B-2. For which real numbers c is there a straight line that intersects the 
curve 


y =x 4+ 9x2 + cx? + 9x4 4 


in four distinct points? 


Problem B-3. Find the set of all real numbers k with the following property: For 
any positive, differentiable function f that satisfies f’(x) > f(x) for all x, there is 
some number N such that f(x) > e** for all x > N. 


Problem B-4. For n => 1, let d,, be the greatest common divisor of the entries of 


A” — I, where 
[3 2 {1 0 
a=(3 | and 1=(f '}. 


d, = ™, 


n 


Show that lim 


nwv7o 


Problem B-5. For any real number a, define the function f,(x) = lax]. Let n be a 
positive integer. Show that there exists an a@ such that for 1 <k <n, 


fa(n*®) =n? —k = fyx(n*). 
Problem B-6. For any integer a, set 
n, = 10la — 100: 2°. 


Show that for 0 <a,b,c,d <99, n, +n, =n,+n, (mod10100) implies 
{a, b} = {c, d}. 


SOLUTIONS. In the 12-tuples (1,9, 19, Ng, 17, Ng, 5,4, N3,N2,N1,N9,n_,) fol- 
lowing each problem number below, n, for 10 > i > 0 is the number of students 
among the top 206 contestants achieving i points for the problem and n_, is the 
number of those not submitting solutions. 


A-1 (59,59, 54, 21, 0, 0, 0, 0, 8, 0, 3, 2) 


Solution. Let b, = a,, b, =a, + a,, b, =a, +a; + a, + a,, and in general, b, = 
Ayn-1 + Ayn-144 + °** +Qyn_,. AN easy induction, using the condition a, <a), + 
ay,+, Shows that b, <b, .,, for all n = 1. Thus, for any positive integer f, 


00 2'-1 t 
Yia,> di a,= 3) db, > tb, = ta. 
n=1 


n=1 n=1 


This shows that L_,a, diverges. 


A-2° (169, 3, 2, 0, 0, 0, 0, 0, 1, 3, 22, 6) 


Solution. The linear transformation given by x, = 4x, y, =y transforms the 
region R bounded by y = 4x, the x-axis, and the ellipse 3x* + y? = 1 into the 
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region R’ bounded by y, = 3x,, the x-axis, and the circle x? + y? = 1; it also 
transforms the region S bounded by y = mx, the y-axis, and $x” + y? = 1 into the 
region S’ bounded by y, = 3mx,, the y,-axis, and the circle. Since all areas are 
multiplied by the same (nonzero) factor under the transformation, R and S have 
the same area if and only if R’ and S’ have the same area. However, we can see by 
symmetry about the line y, =x, that this happens if and only if 3m = 4, that is, 
m= 9- 


A-3 (0, 10, 67, 0, 0, 0, 0, 0, 30, 31, 40, 28) 


Solution. Suppose the vertices of the isosceles right triangle are (0, 0), (1, 0), (0, 1). 
Suppose the points of the triangle can be colored in four colors such that points of 
the same color are always less that a distance 2 — V2 apart. Then the four points 
(0,1), (0,V2 — 1), (V2 — 1,0), 4,0) must have different colors, say colors 
A, B,C, D respectively. The point (0,0) must be of color B or C. Without loss of 
generality, say (0,0) is of color B. Then the point (V2 — 1,2 — y2) is of distance 
at least 2 — ¥2 to points of each of the four colors, and this is impossible. 


A-4 (12,17, 20, 0, 0, 0, 0, 0, 15, 3, 43, 96) 


Solution. A matrix C with integer entries has an inverse with integer entries if and 
only if det C = +1. Therefore, if we consider the function f defined by f(x) = 
det(A + xB), we know that the five values f(0), f(), f(2), f(3), and f(4) must all 
be 1 or —1, so f takes on at least one of those values three or more times. 
However, f(x) is a polynomial of degree <2 in x, and so f can only take ona 
value more than twice if f is constant. Thus f(x) is one of the constants 1 and —1; 
in particular, det(A + 5B) = +1, so A + 5B has an inverse with integer entries. 


A-5 (20, 13, 4, 0, 0, 0, 0, 0, 6, 2, 57, 104) 


Solution 1. It suffices to show that any sequence in $ contains a monotonically 
nonincreasing subsequence. For then, letting (¢,),.. be any strictly inncreasing 
sequence within (a, b), some (in fact, all but a finite number) of the intersections 
SO (t,,t,41,) would have to be empty, otherwise one could form a strictly 
increasing sequence (s,),,. 9 by taking s, © SM (,, 4,41). 


Let (s,),, 5 be a Sequence in S. For n = 0,1,2,... write P 


Sn =Tpen yy + "pan.xn + °° +1 en, 1994) with f(n,1) <f(n,2) < +: <f(n, 1994). 


The sequence (rj, 1)),> 9 has a monotonically nonincreasing subsequence (since 
(7,)n>0 iS a positive sequence converging to 0). Thus we may replace (5s,,),,. ) by a 
subsequence for which (7,,, 1), iS monotonically nonincreasing. In a similar 
fashion, we pass to subsequences so that, successively, each of (Tyn,2)ns0; 
(Tpn,3))n > 02 +++ 4Ty¢n, 1994) n> 9 May be assumed to be monotonically nonincreasing. 
The resulting (s,,),,. 9 is monotonically nonincreasing. 


Solution 2. Let C be the set {r,},., U {0}. Since C is compact, the set S’ of 
numbers representable as a sum of 1994 elements of C is also compact (for 
example, it is a continuous image of C’””*). Clearly S ¢ S’. 

Let (a,b) be a nonempty open interval. Since S$’ is countable, (a, b) \ S’ is 
nonempty; it is open since S’ is closed. Hence (a, b)\ S’ includes a nonempty 
open interval. 

Comment: This proof generalizes to give the same conclusion for any convergent 
sequence (7,,),,. 9- 
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A-6 (5,8, 10, 0, 0, 0, 0, 0, 7, 4, 34, 138) 


Solution. Let A be a nonempty finite subset of the integers Z. By the Pigeonhole 
Principle, any bijection of Z which maps A to itself must be a bijection when 
restricted to A; in particular, its inverse also maps A to itself. Note tha tnot all the 
bijections f,, f,,...,f;)9 can map A to itself, for otherwise if 0 © A we could not 
map 0 to any n € A by a composition f; ° f;,° -*: ° f;,, while if 0 € A, we could 
not map 0 to any n € A by such a composition. 

Let k be the smallest integer such that f, does not map A to itself, and 
suppose that more than 512 of the functions Y map A to itself. We can write F as 
a disjoint union of unordered pairs of functions such that two compositions 
C10 f20 +++ o feo and filo f¥20-+-. o f4 are in the same pair when they differ 
only in the k-th exponent; that is, when e, =d, for i#k. By the Pigeonhole 
Principle, there is then at least one of these 512 pairs in which both functions map 
A to itself. Since all f, with />k also map A to itself, we can use composition 
with the inverses of f;, as needed, to conclude that for some e,,...,e,_1, 
Fi =ffiofsto ++ off and F, = fifo fs2° ++ ° ffs ef, both map A to itself. 
But then F,'° F, =f, also maps A to itself, a contradiction. 


B-1 (45, 26, 57, 0, 0, 0, 0, 0, 42, 28, 6, 2) 


Solution. Answer: {N|315 < N < 325 or 332 < N < 350}. 

Assume N > 0 is within 250 of the 15 squares m?,(m + 1)*,...,(m + 14)’, 
where we can take m = 0. In fact, m will then be positive, otherwise N would be 
within 250 of the additional square 225. We have the necessary and sufficient 
conditions 


(m + 14)” <N + 250 <(m + 15)° — 1, 
(m—1)?+1<N-250<m’. 
Subtracting (reversing inequalities in the second line), we get 


28m + 196 < 500 < 32m + 222, 
which implies m = 9 or 10. 
If m = 9, 
237 < N+ 250 < 24" - 1, 
87+ 1<N-— 250 < 9”, 
or 315 < N < 325. 
If m = 10, 
247 < N + 250 < 257 —- 1, 
97+ 1<N — 250 < 10’, 
or 332 < N < 350. 


B-2 (28, 8, 49, 0, 0, 0, 0, 0, 56, 10, 39, 16) 


Solution. Answer: For the real numbers c with c < 243 /8. 

The constant term and the coefficient of x in a quartic p(x) are irrelevant in 
determining whether there is a line intersection y = p(x) in four points. We may 
also replace p(x) by p(x — a) for any real a. Thus, we may replace the given quartic 
p(x) = x4 + 9x3 + cx? + 9x44 with p(x — 9/4) =x* + (c — 243/8)x7 + -, 
and drop the last two coefficients (we need never calculate them). 
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The problem then is to determine the values of c for which there is a straight 
line that intersects y = x* + (c — 243/8)x? in four distinct points. The result is 
now apparent from the shapes of the curves y = x* + ax’. For example, we may 
note that when a < 0, this “W-shaped” curve has a relative maximum at x = 0, so 
that horizontal lines y = —e for small positive ¢ intersect the curve in four 
points, while for a > 0, the curve is always concave upward, so that no line can 
intersect it in more than two points. 


B-3 (27, 10, 8, 5, 0, 0, 0, 2, 45, 49, 15, 45) 


Solution. The desired set is (—~, 1). 

To show this, first note that if kK > 1 were in the set, then k = 1 would also be 
in the set. However, if f is any function of the form f(x) = g(x)e*, where g isa 
positive, increasing, differentiable function bounded by 0 and 1 (for example, 
g(x) = (1/m)arctan x + 4), we have f’(x) = e*(g'(x) + g(x)) > f(x) and f(x) < 
e* for all x, so k = 1 is not in the set. 

On the other hand, if f’(x) > f(x) for all x, then (since f is positive) we have 


a > 1 forall x, 
xf'(t) x 
haw @> Ji for all x > 0, 


log( f(x)) > x + log( f(0)) for all x = 0, 
f(x) > f(O)e* for all x > 0. 


If k is any number less than 1, then for large enough x we will have f(O)e* > e** 
(since f(0) is positive), which shows that k is in the set. 


B-4 (15,1, 4, 0,0, 0, 0, 0,5, 22, 71, 88) 


Solution 1. From experimentation (and then an easy induction on n) we see that 
A” has the form 

ay, b, 

2b, a 


with a, odd, and, since det A” = 1, we have a? — 1 = 2b’. Thus a, — 1 divides 


2b;, so that d, = ged(a, — 1,b,) => (a, — 1)/2. Since lim, ,.a, = © (eg., 


a, > 3a,_,), the result follows. 


A’ = 


Solution 2. Define the sequence rp,7,,7,,... by 79 = 0, 7, = 1, and r, = 67,_, — 


r,_, for k > 1. We first show by induction on k that 
A" =] =r,,,(A"* —A*) —7,( A" *"! —A**') fork = 0. (1) 


This is clear for k = 0 and, for the inductive step, using A? — 6A + I = 0 (the 
characteristic equation), we have 


r.{(A"-* — A‘) = 7,( A! = ART) 
_ rea ((6 A" — Ar-k-2) _ (6 Ake! — Ak+2)) — 7, (Athol = getty 
= (6r.41 — r,)( Ar! — A®tT) — ry (A? — Akt?) 
= 7. .5( At! = AR) = (ADR? = ARH), 
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Applying (1) with k = |n/2], we obtain 


ry g( A"? t! — An/?-*), if n is even, 


A" —I[= 
1) /2 1/2 oo 
(ns sr + Nn -1) 2) (A“* 2 — AM N/2) if n is odd. 


In either case, the entries of A” —J have a common factor that -— © since 
lim, 0%, = ~ (eg. 7, > 5r,_, for n > 1). 


Solution 3. We know that the entries of A” are each of the form a,A{ + a,A5 
where A, = 3 + 2V2 and A, = 3 — 2y2 (the eigenvalues of A). So, using the 
entries for n = 1,2, we derive 

M+N NL AHN 
2 2V2 
MN ONAN 
v2 2 


Observing that A, = u?, where uw, = 1 + V2 and w, = 1 — v2, we see 


A + Qe Mo Me 
d, = gced| ———~ — 1, —_—* 
nk 2 2y2 
n n\2 n n n n 
~ ocd (eT — ea) CHT #3) CMT + #3) 
2 2V2 
Mi ~ HB My — Ba Ma + 


gcd 


v2 v2 7 2 


since (wu? — w2)/ V2, and (u” + w%)/2 are rational integers. As | ,| > 1 and 
| >| < 1, we conclude lim, ,., (u7 — 45) = ©. Hence, lim, ,.. d, = ©. 


Comment: The proof extends to establishing the same result for integral matrices 
fm 


A= (: *) of determinant 1 and |trace(A)| > 1 (the latter to guarantee r, — 


where +, = trace(A)r,_, —1r,_,). A similar argument gives the same conclusion 
for the entries of A” — J. 


B-5 (11,4, 4,0, 0, 0, 0, 0, 32, 10, 15, 130) 
Solution. For any a > 0 and any positive integer k, we have 
fé(n?) =[alal Lan? |||] < lavas an] = f(r), 
so it is enough to show that there exists an a > 0 such ‘that for 1 <k <n, 
fi(n?) =>n*-—k and atn?<n?—-k +1. 
For k = 1, the first of these two inequalities yields an* > n* — 1; we will show 


that a = (n? — 1)/n* = 1 — 1/n’ will do. Using this value of a, we use induction 
on k to show that f*(n?) > n* —k for 1 < k <n; in fact, if this holds for k, we 
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have 


fr '(n’) = f,(n? — k) 


orn 
“|b zta}ee-] 


=n’ —(k+1) 


completing the induction. 
To show that a*‘n? <n? —k + 1, note that this inequality is clear when n = 1 
and hence k = 1, a = 0; for n > 1, the inequality is equivalent to 


k-1 
ak<1- 5 
n 
n2—1)\" k-1 
<i ’ 
n? n? 
nz \* 1 
ni} 7 k—1? 
n 
ne \" yr] 1 n*—1 
< = . 
n?—1 n? k-1 n*>—k+1 
-— 5 
Now, 
2 k-1 k-1 2 
n 1 1 1 k-1 no+k—2 
={1+ >1+ = —,——_., 
n> —1 n*—1 n*—1 n> —1 
and it is easy to see by cross-multiplication that for 1 < k <n, 
n+k—2 n2—1 | 


—————— > ——————— 
n> —1 n*>—k+1’ 
completing the proof. 
B-6 (14,11, 1,0, 0,0, 0, 0, 16, 10, 50, 104) 


Solution. Observe that n, = a (mod 100) and n, = 2% (mod 101). 
Suppose 1, +n, =n, +n, (mod10100). Then n, +n, =n, + nz (mod 101), 
SO 


27 + 2° = 2° + 24 (mod 101). (1) 


Also, n, +n, =n. + nq (mod 100), so a + b=c + d (mod 100), and therefore, by 
Fermat’s Theorem (since 101 is prime), 2*°*? = 2°** (mod 101). That is, 


27-2 = 2°-24 (mod 101). (2) 
From (1) and (2), we see that {27,27} and {2°,2%} are the same set modulo 101, 
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namely, the set of roots of the quadratic polynomial (x — 2“)(x — 2°) =x* — 
(27 + 2°)x + 292? = (x — 2°\x — 2%) in the field Z,),. To see that {a, b} = {c, d}, 
it suffices to show that the numbers 2° for a &€ {0,1,...,99} are distincnt modulo 
101. That is, we need to show that the order of 2 modulo 101 is precisely 100. For 
this, it suffices to show that 27? # 1 (mod 101) and 2°° # 1 (mod 101). We have 
21° = 1024 = 14 (mod 101), so that 27? = 14 = —6 (mod 101), from which 2°° = 
220220910 = 36-14 = —1 (mod 101). 


Klosinski / Alexanderson: Larson: 

Department of Mathematics Department of Mathematics 
Santa Clara University St. Olaf College 

Santa Clara, CA 95053 Northfield, MN 55057 


PICTURE PUZZLE 
(from the collection of Paul Halmos) 


a 
a 


Tt mle 
I Se 
2 


{™ 


Hint: He doesn’t look the same now as he did in 1951. 
(see page 690) 
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Mathematics: Questions and Answers 


Beno Eckmann 
Translated by Peter Hilton* 


The International Congress of Mathematicians, which takes place every 4 years 
and which is being held this time in Zurich, opens today. It brings together some 
3000 mathematicians, active in research and university teaching, from all over the 
world. Not only in view of the (temporary) inundation of the city with this 
particular species of scientist but also at other times is the question often asked, 
what do mathematicians really do? In what follows I will try to give some small 
insight into the nature and processes of this science. 


Fermat’s Theorem. In June 1993 a sensational report went round the mathemati- 
cal world; by electronic mail it reached even faraway universities, academies and 
colleges with lightning speed. The famous 350-year-old Fermat Theorem had been 
proved. To the surprise of most mathematicians this report was also published in 
many non-specialist media, thereby reaching a broad public. The New York Times 
devoted a front page article to it and Andrew Wiles (Princeton), who had. 
announced the proof’, along with those who had prepared the way, especially 
Gerhard Frey (Essen) and Kenneth Ribet (Berkeley), became as famous overnight 
as the stars of the arts and sport. The problem, unsolved for 350 years, seemed to 
exercise as strong a fascination for laymen as for specialists. 

For once it was neither the powerful technico-scientific applications nor the 
attractive coloured computer pictures and graphics which excited a large public, 
but the actual mathematical process—and this with an unprecedented intensity. It 
was therefore only to be expected that, over the ensuing days, we mathematicians 
were bombarded with questions from all sides. Did we take the opportunity to 
make people, near and far, more familiar with our science? For the questions, on 
the whole, went to the heart of the matter. 

The basic underlying question in Fermat’s Theorem should be explained for the 
sake of completeness. The equation x* + y? = z* has many solutions in integers, 
for example, x = 3, y = 4, z = 5. On the other hand, the equation x? + y? =z? 
has no solutions in integers, and Fermat asserted, in 1635, that he could prove that 
the equation x” + y” =z”, for n > 2, has no solutions in integers (except, of 


* The original text of this article, in German, appeared in the Swiss newspaper Neue Ziircher Zeitung 
on August 3, 1994, to mark the opening of the International Congress of Mathematicians. It was 
suggested to the author, Beno Eckmann, that an English version would be welcomed and would reach a 
wider public. The translation was undertaken by Peter Hilton. 

'At the time of writing (8/3/94— translator) it appears to experts that there is a gap in the 
complicated chain of inferences constructed by A. Wiles. This does not imply false reasoning, but rather 
that the argument must be supplemented. The great achievement of Wiles is only marginally affected 
by this. 

Added by translator (2/21/95)—It has now been announced by Andrew Wiles and R. L. Taylor 
(Cambridge), and verified by colleagues, that the gap has been filled. 
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course, x = y = z = 0). We may doubt whether he really had a proof. The problem 
seems simple and innocuous; it has aroused the interest of many amateur and 
professional mathematicians over the centuries, and many false proofs have been 
offered. 

On the other hand the assertion has been proved for very many values of the 
exponent n; up to last year, this included all values of n up to 4,000,000. It is 
noteworthy that the very profound methods which were developed to do this have 
had a decisive influence on modern mathematics; the theory of the so-called 
algebraic numbers, from which so many general ideas stem, arose primarily from 
these efforts. And now we know, provided Wiles’ arguments are found to be 
watertight, that the assertion is valid for all n. 

All this leaves the layman somewhat perplexed—so simple a statement and yet 
so difficult to prove? And do professional mathematicians occupy themselves with 
such things—and get paid for doing so? The further questions, below, give us the 
opportunity, in this respect, to correct some misapprehensions. 


An invisible part of our culture. What have we gained? What are the consequences 
for our world of the proved result? 


Here we must in all honesty reply: we have gained nothing. That the theorem 
has been proved has no consequences, even for number theory itself. But—does 
one pose such a question in the face of a masterwork of art or an impressive 
achievement in sport? Mathematics is, like the arts, a part of our cultural tradition, 
and has always, in ancient and modern times, obtained its justification from this 
fact. But, in contrast to the arts and sport, mathematics has no general public. Its 
assertions, aS we must recognize, are immediately accessible only to a small circle; 
and the newer, the deeper, the more abstract the result, the narrower the circle 
will be. Thus mathematics can scarcely rely on making a resounding splash in the 
media, apart from very exceptional cases such as Fermat’s Theorem. 

But, of course, that is just one side of the story. On the other side stand the 
innumerable applications of mathematics which one meets everywhere. Mathemat- 
ics has become an indispensable tool in the technico-scientific world of today, 
whether it is concerned with various kinds of calculation, with physics, chemistry, 
biology, medicine, meteorology, telecommunications etc. Even in simpler matters 
do all of us, knowingly or unknowingly, apply mathematical reasoning, when we 
speak of probability, extrapolation, analysis and interpretation of graphs, coding, 
averages and such like. 

One does not, however, reflect that all the mathematical concepts, methods and 
results which are applied are abstractions, which had to be thought up. And even 
the solution of apparently ‘frivolous’ and useless problems a la Fermat—and 
many others originating in simple, practical questions—demand the elaboration of 
theoretical structures of great generality. The universal applicability of mathemat- 
ics, which, as a rule, is neither intended nor foreseeable, seems to depend on those 
conceptions; a few examples to illustrate this will be cited below. 

The relationship between these two very different aspects of mathematics is not 
easily comprehended. The instrument we employ for recognizing, describing, 
understanding and expressing by means of theoretical construction is mathematics, 
its language, its mode of thought, its results; that is, a structure of thought which is 
abstract and which is not primarily erected for this purpose. The applications bear 
witness to the power of mathematics, but are not its real motivation. The springs of 
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~~ 


Beno Eckmann 


“mathematization” seem to be of a very different kind. If we try to describe them; 
we need words like curiosity, thirst for knowledge, the impulse towards play. 

A game then, pretentious and difficult, as all good games should be? In a 
certain sense, yes. But one knows that ultimately it has significance and effect and 
that places the motivation close to that of the artist. And, as in the arts, the criteria 
of value and rightness are not easily made precise. They include intensity, beauty 
and unity of the expression, the opening of new horizons, and insights which stem 
from a profound struggle to understand the problem. Even this remains inevitably 
restricted to the circle of the ‘initiated’. Thus is our art invisible to a wider public. 


Mathematical proof. Why prove something which is known to be correct in 4,000,000 
cases, and more besides? Wouldn't one regard this, in any other endeavour, itself as 
‘‘ proof”? 


Here we must again go further back and, above all, insist that all those 
mathematical concepts, which are daily and hourly in action, find no place in the 
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real world we observe. The apparently simplest things like a straight line, 3-dimen- 
sional space, whole numbers, probability are creations of the human spirit, to say 
nothing of real or complex numbers, groups, vector spaces, integrals etc. Whether 
all these exist outside our thoughts or not, i.e., whether it is a matter of discovering 
or inventing, is also a bone of contention among mathematicians—but irrelevant 
here. 

Certainly these ideas arise originally from our observations and experience, 
mainly in the domain of geometry and physics on the one hand and numbers and 
counting on the other. But first must come the complete abstraction, the release 
from reality, to form from that experience a mathematical object. This is only 
defined by its combinatorial properties, which vary from case to case and which 
satisfy certain axioms; essential here is the structure of mutual relations. In the 
framework thus established we apprehend, guided by intuition and experiment, 
relationships, results, theorems. Whether they are correct one can only determine 
by a strictly logical analysis of the proof—otherwise one does not know whether 
they are valid. Experience shows that intuition may lead us astray. So long as we 
have no proof of Fermat’s Theorem, we cannot be sure that integer solutions do 
not exist for large values of the exponent n. 

Concerning the multiplicity of applications of mathematical structures and 
results, this obviously stems from their universality, their independence from 
concrete objects. Whether it concerns the forecast of an eclipse of the sun or the 
moon, the mathematical design of a bridge, the formulation of cosmological 
theories, the schemata of the physics of elementary particles, or the analysis of 
computer tomograms, there are always abstract, mathematical tools behind it, far 
removed from any reality. It would be very dangerous to apply them if one were 
not sure of their validity. 


No Nobel Prize. Will Andrew Wiles receive the Nobel Prize? 


There is no Nobel Prize for mathematicians; this doesn’t seem to be well-known, 
but it gives rise to speculation. Many explanations circulate, stories about conflicts 
between Nobel and a prominent mathematician of the time, and much more 
besides; as the President of the Nobel Committee once expressed it, not all these 
stories can be true. We don’t know the reason, we can only conjecture: mathemat-- 
ics was simply forgotten. As so often happens, it was seen as a tool, which is simply 
to hand and which we apply; the mathematician’s task is merely to carry out the 
necessary calculations. Even today when we generally recognize the significance of 
mathematics, people know very little of its true nature and inner beauty—because 
the research takes place within a narrower circle and is invisible from the outside. 
The non-mathematician sess only the tip of the iceberg. What is beneath? There 
lies this difficult and scarcely intelligible process of creating mathematical ideas 
and structures out of the vague experience and intuition of our environment, 
putting them to work and recognizing their connections; and even struggling with 
totally unexpected consequences of our own thinking. These are consequences which 
can give rise to far-reaching applications, from which further problems arise which 
call for new solutions or demand more new ideas. 

An example which especially well illustrates how mathematical thought emerges 
from the depths to break surface is the discovery of electromagnetic waves, 
certainly one of the most important events in the history of science and modern 
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mankind. The credit should be given to the physicists James Clark Maxwell 
(1831-1879) and Heinrich Hertz (1857-1894); but it rests heavily on mathematical 
theories which had been developed much earlier for other reasons (analysis, the 
wave equation), and which showed that the Maxwell-Heaviside equations lead 
inevitably to waves— and this was experimentally verified by Hertz. 

Similarly much else came in unexpected ways to be applied to the physical 
world: group theory, developed by Galois to study the solution of algebraic 
equations, has been applied to the elucidation of atomic spectra; Boolean algebra, 
which stems from mathematical logic, is applied to electric circuit theory; the 
Radon transform has been applied to computer tomography; category theory to 
the design of automata and formal languages; differential geometry, topology and 
algebra to the new theoretical physics. Always there were completely different 
reasons for creating and formulating the mathematical concepts—or perhaps no 
other reason but the inner beauty of the conceptual construction? 


What about the computer? Can one not simply leave the difficult considerations 
involved in the Fermat proof to the computer? 


This question is often asked, with some justification. For it is known not only to 
those involved, but also to the outsiders, that this is the era of the computer, which 
has immeasurably increased the possibilities for applying mathematical thought to 
our world. Moreover, not only applied, but also pure mathematicians, are using 
the computer in the most intensive way, to experiment, to verify conjectures, to 
render complicated geometric situations intelligible, and to push through difficult 
algebraic manipulations. But none of this replaces strict conceptual proof; on the 
contrary, it, in fact, depends on its logical foundations. 

Now in an article which appeared last year in the Scientific American the 
“Death of Proof’ was announced*. The text was very well documented and 
contained quotations from well-known mathematicians. Classical proofs within a 
conceptual framework were to be replaced by visualization and verification, 
naturally on a computer; the Fermat proof by Wiles was characterized as a 
“splendid anachronism”. The article released a flood of indignant protests, even 
from mathematicians quoted in the article. All were agreed that the actual 
situation had been completely misunderstood. Semistrict arguments lead to 
semitruths which are correct only with a certain probability, or even false (and for 
whose uncertain validity huge amounts of computer time must be financed). 

One could ignore this if a danger did not present itself whose consequences 
could be worse than one thinks. On the basis of such thinking a worldwide, 
fundamental restructuring of mathematics education could be proposed, which 
would replace everything by interdisciplinary games on the computer. It appears 
that already textbooks and software in this direction have been prepared, and here 
too certain reformers are following the same trend. Thus would the growing 
generations believe what they see on the screen, without knowing that “nothing 
has been proved’. And the experience of the inner beauty of mathematical 
thought would be withheld from them. Mathematics must be used according to its 
true nature, abstract, valid within a strict context, universal, and, precisely for that 
reason, eminently practical. 


?John Horgan, The death of proof, Scientific American, October 1993. 
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So do the words of Hermann Weyl°, uttered 50 years ago, take on a new 
urgency: 

“We do not claim for mathematics the prerogative of a Queen of Science; there 
are other fields which are of the same or even higher importance in education. But 
mathematics sets the standard of objective truth for all intellectual endeavours; 
science and technology bear witness to its practical usefulness. Besides language 
and music it is one of the primary manifestations of the free creative power of the 
human mind, and it is the universal organ for world-understanding through 
theoretical construction. Mathematics must therefore remain an essential element 
of the knowledge and abilities we have to teach, of the culture we have to transmit, 
to the next generation.” 


Eckmann: Hilton: 

Mathematik Department of Mathematical Sciences 
ETH-Zentrum SUNY at Binghamton 

CH-8092 Zurich P.O. Box 6000 

SWITZERLAND Binghamton, NY 13902-6000 


eckmann(@math.ethz.ch 


>From the first page of the Collected Works of Hermann Weyl, edited by K. Chandrasekharan 
(Springer Verlag, 1968). 


It is not knowledge, but the act of learning, not possession but the 
act of getting there, which grants the greatest enjoyment. When I 
have clarified and exhausted a subject, then I turn away from it, in 
order to go into darkness again; the never-satisfied man is so strange 
—if he has completed a structure, then it is not in order to dwell in 
it peacefully, but in order to begin another. I imagine the world 
conqueror must feel thus, who, after one kingdom is scarcely 
conquered, stretches out his arms for others. 


—Karl Friedrich Gauss (1777-1855) 
Letter to Bolyai, 1808. 


Answer to Picture Puzzle 
(p. 684) 


Alex Rosenberg. 
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Teaching Math More Effectively, 
Through Calculational Proofs 


David Gries and Fred B. Schneider 


Lower-level college math courses usually avoid using formalism, in both definitions 
and proofs. Later, when students have mastered definitions and proofs written 
largely in English, they may be shown how informal reasoning could be formalized, 
but the impression is left that such formalization is not worth the effort. The 
design of proofs is also not taught. Students see proofs and may be asked to 
develop a few themselves, but there is little or no discussion of principles or 
strategies for designing proofs. 

Few are happy with the results of these courses. Generally, students’ reasoning 
abilities are poor, even after several math courses. Many students still fear math 
and notation, and the development of proofs remains a mystery to most. In short, 
students are not being equipped with the tools needed to employ mathematics in 
solving new problems. 

We believe that this state of affairs can be improved. This article describes our 
approach. 


THE INADEQUACY OF INFORMAL PROOFS. A proof of a theorem should 
provide evidence for belief in the validity of the theorem, where the evidence 
consists of facts (e.g. previously proved theorems) and an explanation of how they 
interact to convince. A good presentation of a proof should clearly indicate the 
facts and explain how they are combined. It should also make the proof appear so 
obvious that readers can see how it was developed, can explain it to others, and 
perhaps can prove other theorems in a similar fashion. 

Now look at the proof in Table 1, which was taken from a math text and is 
typical of informal proofs. First, note that this proof does not state the facts on 
which it rests. (For example, it says, “If y 4A, then, since y €© A UB we must 
have y € B”, but there is no reference to the theorem that justifies this inference.) 


4 


TABLE 1. Conventional Proof of 4 U(BAC)=(AUB)N(A UC) 


We first show that A U(BONC)C(AUB)N(A UC). If x EA U(BNC), then either 
x€A orxE BNC. If x EA, then certainly x @A UB and x €AUC,s0 x EC (AUB)N 
(A UC). On the other hand, if x © BOC, then x € B and x €C, so x € AUB and xe 
AUC,so x €(AUB)N(A UC). Hence, AU(BNC)C(AUB)N(A UC). 


Conversely, if ye (A UB)N(AUC), then ye AUB and y€A UC. We consider two 
cases: y€A and y €A. If y EA, then ye A U(BNO), and this part is done. If y €.4, 
then, since y © A UB we must have y € B. Similarly, since y © A UC and y €4A, we have 
y €C. Thus, y © BNC, and this implies ye A U(BNC). Hence (AUB)N(AUC) SG 
A U(BQOC). The theorem follows. 
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Second, it is difficult to see precisely how the facts interact—the sequence and 
subsequences of inferences and all the cae analyses in the proof cannot be easily 
digested. The structure of the proof is hidden by all the verbiage. One case analysis 
is presented in two paragraphs and others by sequential sentences within a 
paragraph; however, sequential sentences are also used to define steps common to 
all cases. Finally, this proof yields little insight into its development—how did it 
arise? 

And yet, in spite of its inadequacies, this proof (and others like it) is held up as 
a model for students to emulate. 


CALCULATIONAL PROOFS IN AN EQUATIONAL LOGIC. Our thesis is that 
mathematics and rigorous thinking can be taught more effectively by first teaching 
the design of rigorous proofs using a formal logic. However, the choice of logic and 
the accompanying method of proof is critical to success. In our experience, an 
equational logic, which is based on equality and Leibniz’s “substitution of equals 
for equals”, is most suitable because it has the following characteristics. 


¢ Equational logic is easy to teach, since the style is already familiar to those 
who have had high-school algebra. 

Equational logic provides an alternative to reasoning in English. Rarely do 
proofs in equational logic parrot informal English arguments. Instead, proofs 
are calculational, in that they are developed by calculating using the rules of 
the logic, much as one calculates to solve a problem in high-school algebra. 
Further, principles and strategies can be used to help discover theorems and 
proofs. 

¢ The rigorous use of equational logic need not lead to overwhelming complex- 
ity (as is the case with some logics). On the contrary, it is often a simplifying 
force. Typically, calculational proofs are shorter, simpler, and easier to 
remember than informal English proofs. 

Equational logic is versatile—it can be extended to a wide variety of mathe- 
matical domains. 


Table 2 contains a calculational proof of theorem p V gq = p V -q =p. Note 
that equivalence = is being treated associatively, so that this theorem can be 


TABLE 2. Equational proof of p Vq =p V nq =p 


pVq=pVn7q 
(Distr. of V over =, pV (q=Er)=pVq=pVr) 
pV (q=7q) 


(aq =4q = false) 
p ® false 

(Identity of V, p V false = p) 
Pp 


viewed either as(p Vgq=p V nq) =p oras pV gq =(p Vag =p). Symbol = 
is used conjunctionally: b = c = d is equivalent to b = c Ac = d.' Use of associa- 
tivity of equivalence helps avoid formal detail without sacrificing rigor—our 
notation is designed with an eye to preventing complexity from overwhelming. 


‘Operator = is used for equality over booleans; = is used for equality over any type, including 
boolean. 
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Each step of the proof in Table 2 has the following form. 


E[v:= P| 
= (P=Q) 
E[v = Q] 


Such a step shows equality of two formulas using the rule of “substitution of equals 
for equals”. The hint between the two formulas shows the equality being used in 
the substitution (E[v := P] denotes expression E with every free occurrence of 
variable v replaced by expression P). Transitivity of equality allows us to conclude 
that the first and last formula of the proof of Table 2 are equal. 

Notice that the proof format makes it easy to find the facts on which the proof 
depends—they are given within the angle brackets ¢ and >. Here, we have written 
out the full text of each fact, but we usually use the name or number of an already 
proved theorem.’ 

Explicit principles and strategies drove the development of the proof in Table 2. 
For example, one strategy for proving P = Q is to transform the more complicated 
of P and Q into the simpler one. In the proof, we viewed the formula to be proved 
as (p V gq =p V-7q) = p and started with the more complicated, left-hand term. 
Second, the proof in Table 2 is “opportunity driven” or “forced”, in that at each 
step, the shape of the formula almost dictates in a unique way what substitution to 
make. Here, the shape of the first line of the proof cries out for simplification 
using distribution of V over =. The second step is an equally obvious simplifica- 
tion, based on the shape of the formula. 

Table 3 gives our calculational proof of distributivity of set union over set 
intersection. In contrast to the proof of Table 1, this proof exhibits all the good 


TABLE 3. Calculational Proof of A U(BNC)=(AUB)N(AUC) 


Below, we prove v € A U(B NC) =v E€(A UB) N(A UC). By Extensionality (the definition 
of equality of sets), we then conclude A U(BNC)=(AUB)N(AUC). 


vEAU(BNC) 
(Definition of U) 
vEAVVEBNC 
= (Definition of 4) 
vVEAV(WVEBAVEC) 
= (Distr. of V over A) 
(VDEAVVEB)A(WWEAVVEC) 
= (Definition of U, twice) 
(VEAUB)AWEAUC) 
(Definition of 4) 
vel(A UB)N(AUC) 


qualities mentioned earlier. It refers to all the facts it uses (e.g. the definition of 
U). Its structure is simple, with each step being clearly delineated. And, it is based 


* Formally, our logic consists of 15 axioms and 4 inference rules, and a theorem is either an axiom or 
a formula that is derived using the inference rules. The inference rules are Substitution of equals for 
equals (Leibniz), Transivity of equality, Substitution, and Equanimity: 
Leibniz: + P=Qthen + &£[z = P|] =E£l[z:=Q] 
Transitivity:y t P=Q,Q=Rthen +P=R 
Substitution: + P then + Plz :=Q] 
Equanimity: + P,P =Qthen + @Q 


1995] TEACHING MATH MORE EFFECTIVELY 693 


on a strategy—one that is used over and over in mathematics: To prove something 
about operators (here, U and /), eliminate them using their definitions, perform 
some manipulation, and reintroduce the operators. 

Anyone experienced in such calculational proofs will find the proofs of Tables 2 
and 3 obvious and straightforward and will have no difficulty reproducing them. 
And, although these proofs are rigorous (and could be checked by a mechanical 
proof checker), complexity does not overwhelm. 

Equational logic and the calculational approach can be extended to all domains 
typically taught in a first discrete math course—e.g. set theory, mathematical 
induction, a theory of integers, functions and relations, combinatorics, and recur- 
rence relations. This is done by first defining the pure predicate calculus and then 
extending it by adding new types, presenting axioms that define the manipulative 
properties of the operations on those types, and building up a library of theorems. 

A key to making rigor and formalism palatable is to keep notation consistent 
and uniform. Mathematics employs a number of different notations for quantifica- 
tion—see, for example, the left column of Table 4. We replace these different 
forms by a single notation for all quantifications. For any operator * that is 
associative, is symmetric, and has an identity, the notation? 


(*i|R.i: P.i) 


denotes the “accumulation” using operator * of the values of expression P.i over 


TABLE 4. A Uniform Notation for Quantification 


Conventional notation Uniform notation 


YP? (+il1 <i <3:i7) 
(VWi).1<i<3 => Dbd{i]=0 (Aill <i < 3: bli] = 0) 
(31.1 <i<3A bli] =0 (Vill <i < 3: b{i] = 0) 


(Uil1 <i <3:5S;,) 


all values of 7 that satisfy range-predicate R.i. For example, Table 4 gives the 
conventional notation and a more uniform notation for four different quantifica- 
tions. Other operators that can be used for * are multiplication of integers, reals, 
and complex numbers, b-c; union of sets, S$ U 7; intersection of sets, SO T; 
minimum of two values, b | c (if | does not have an identity, axioms and theorems 
that deal with a false'range R.i are not applicable); maximum of two values, b f c; 
and greatest common divisor, b ged c. 

With a single notation, scope, free occurrence of a variable, and bound occur- 
rence of a variable can be defined for all quantifications just once. More impor- 
tantly, general axioms and theorems for manipulating quantifications can be 
introduced. The issue of quantification is thus simplified. 

After introducing rules for quantification, it is easy to introduce pure predicate 
calculus. Operators A and V are associative, are symmetric, and have identities, 


>Bound variable i can be annotated with a type to indicate the range of values it may assume. A 
discussion of types is outside the scope of this article. Also, we write R.i to denote application of 
function R to argument /; eliminating the traditional parentheses avoids clutter. 
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so (Ai|R.i: P.i) and (Vi|R.i: P.i) make sense. The first is universal quantification, 
more conventionally written as (Wi|R.i: P.i); the second is existential quantifica- 
tion, (Si|R.i: P.i). 


TEACHING THE CALCULATIONAL APPROACH. Equational propositional 
logic, along with preliminaries (e.g. the definition of textual substitution) can be 
taught to college freshmen in four weeks. During that time, students see many 
proofs and develop may themselves, in the calculational style. They also learn 
strategies and principles for designing proofs. As students develop a skill in 
proving theorems, they learn that attention to rigor may be simplifying force—and 
not an onerous burden. 

Four weeks may seem like a long time to spend on propositional logic, but 
learning the calculational approach and gaining confidence in formal manipulation 
requires it and is worth it. Initially, most students are troubled by the prospect of 
uninterpreted manipulation. They want to think about the meanings of mathemati- 
cal statements. Having meanings for objects is a “safety net”, which, students feel, 
prevents them from performing nonsensical manipulations. Unfortunately, the use 
of the “meaning” safety net does not scale well to complicated problems. Skill in 
performing uninterpreted syntactic manipulation does. 

Students also have to be convinced that using formalism can be helpful. They 
must see first hand that a rigorous approach can help them solve problems they 
could not easily solve without it. This is possible with our approach. After just 
three days of learning equational logic, one can begin to attack the kinds of word 
problems that are found in Smullyan’s books, for example. 

Once logic and proof have been thoroughly presented, other topics can be 
discussed—set theory, a theory of integers, and mathematical induction. Each 
topic is presented using the same calculational approach. In this manner, the 
notions of proof and proof style become the unifying force, the glue that binds 
together arguments in all domains. Discussion of informal versus formal presenta- 
tions of proofs imparts deeper understanding of both, enabling students to deal 
more easily with math that they will see in later courses. For example, proof by 
contradiction in any domain is easily seen to be based on the theorem p == p > 
false of propositional logic. 

As an example of the greater understanding that rigor and precision allow, 
Suppose we have proved the metatheorem that a formula P is a theorem iff the 
formula (Vx|: P) is a theorem. Then, the different ways in which theorems are 
expressed in texts can be discussed, and the following three statements can be seen 
to be equivalent. In the first, it is assumed informally that a and b are 
integers—perhaps this is mentioned in the accompanying prose; in the second, the 
type is given informally; in the third, the type is made formally explicit. 


at+b=b-+a 
a+b=b+a (fora, b integers) 
(Va,b:Zi:at+b=b+a) 


To make rigor and formalism palatable, every new notation must be explained 
and rules must be given for manipulating it. Fear of formalism comes from having 
to use a formalism without knowing rules for its use, and attention in a class to 
such basic detail overcomes this fear. For example, traditionally, students are not 
shown rules for manipulating summations like \}_, i7; consequently, they have 
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trouble with mathematical induction, where problems require manipulation of 
such summations. 

The following example shows how attention to rigor and formal detail provides 
a measure of clarity that is impossible to obtain otherwise. Consider proving 
b™*" =b™-b", for n,m natural numbers, by mathematical induction. Without 
formalizing quantification and having rules for manipulating it, no amount of 
informal explanation will clarify for students the different roles of m and n in the 
proof. However, b”*” = b”-b” is equivalent to (Vm, n|0 <n AO <m:b"*" = 
b” - b”), which can be rewritten (using an axiom of quantification and the ability to 
name a formula) as 


(Wn|0 <n: P.n) where P.n: (Wm|0 <m:b""" =b™-b"). 


Now it is clear that n is the “induction variable” and that induction hypothesis P.n 
is a universal quantification over m. 

Further, once students understand quantification, they can prove the following 
—using a calculational proof. Let U be a set and ~< a binary relation over U. 
Then (U, <) admits induction iff (U, <) is well founded. This theorem, which is 
rarely mentioned in informal presentations, gives deeper insight into induction. 

When formal notations are presented properly, as a repository of the facts and a 
means of clarification, students begin to like formalism and to rely on it. It is the 
formalism that provides rules for judging between sound and unsound inference 
and that helps expose ambiguity and eliminate it. 


DISCUSSION. The rigorous approach to teaching math has not, as yet, been 
accepted. Two criticisms are heard frequently: (1) students can’t handle rigor and 
formalism, and (2) teaching syntactic manipulation impedes understanding that a 
more semantic and informal approach provides. 

Our own experience belies the first criticism; in fact, the criticism should go the 
other way. Teaching mathematics through informalism is like driving in a fog. One 
sees dim figures in the distance, and every once in a while some of them suddenly 
appear clearly, but usually everything is veiled and mysterious. It’s dangerous to 
drive in the fog, especially in a strange territory, and one must drive slowly. Even 
SO, One may not always be sure where one is. Teaching rigor and precision, 
provided it is done without the veil of complexity interfering, burns away the fog, 
leaving everything crisp and clear and making it possible to drive faster and to 
enter uncharted lands. 

We can rebut the criticism concerning semantics versus syntactics as well. An 
informal proof, like that in Table 1, can be translated into a proof in a natural- 
deduction or Hilbert-style logic. The resulting proof is every bit as syntactic as 
ours. The English proof is simply an informal version of a syntactic proof—and, as 
we have seen, a poor one at that. Therefore, the informal proof has no more 
meaning or semantics than a formal calculational proof. 

Perhaps this criticism concerning semantics comes about because formal state- 
ments are sometimes difficult to understand. However, presenting a formal defini- 
tion or theorem does not preclude giving alternative views as well. For example, a 
presentation of the axiomatic definition of set union can be supplemented with a 
Venn diagram, an English description, and an informal notion of evaluation. 
Nevertheless, it should be realized that for purposes of reasoning—constructing 
proofs—it is the axiomatic definition that is important. In fact, the axiomatic 
definition should be viewed as encoding all the meaning of the object being 
defined. 
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We also hear complaints that our approach suppresses intuition, that everything 
begins to appear mechanical. By “intuition” one usually means direct perception of 
truth or fact, independent of any reasoning process; keen and quick direct insight; 
or pure, untaught, noninferential knowledge (Webster’s Encyclopedic Unabridged 
Dictionary, 1989). There is simply no hope of teaching this—how can one teach 
something that is untaught, noninferential, and independent of any reasoning 
process? Of course, one can hope that students will develop an ability to intuit by 
watching instructors in math courses over the years. But this hit-or-miss prospect 
cannot be called teaching intuition. 

On the other hand, a good part of mathematics is concerned with the opposite 
of intuition: with new and different reasoning processes that complement our 
ability to reason in English. This part of mathematics can be taught, and our 
approach to logic is an excellent vehicle for that task. Further, using the calcula- 
tional approach to proofs, we are able to teach aids to discovery. In particular, with 
our disciplined, syntactic, proof style, we can teach principles and strategies whose 
application can indeed lead to the discovery of some (but not all) theorems and 
proofs. We have yet to see comparable principles and strategies for conventional 
English proofs. 

Note that we are not against intuition; we have only stated that it cannot be 
taught. Moreover, we believe that discussing aids to discovery, as explained in the 
last paragraph, does not suppress intuition but goes further in aiding it than does 
the conventional method of teaching proofs. 

New ideas in teaching are slow to catch on. People don’t like changing their 
habits—especially if it requires them to change their own way of thinking. 
However, current teaching methods are not exciting students or even educating 
them well, and alternatives should be seriously considered. Our approach bears 
looking into by all who want to teach mathematics effectively.’ 


Department of Computer Science 
Cornell University 
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Ithaca, NY 14853 
gries@cs.cornell.edu 
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“The authors’ 500-page text A Logical Approach to Discrete Math (Springer Verlag, NY, 1993) uses 
the approach described in this article in teaching the usual topics in discrete math—logic, set theory, a 
theory of integers, induction, functions and relations, combinatorics, solving recurrence relations, and 
graph theory. The 300-page Instructor’s Manual contains other essays that concern the approach, as 
well as answers to the exercises. Together, the text and Instructor’s Manual contain over 700 
calculational proofs, most of which are short and simple. Contact Gries at gries@cs.cornell.edu to 
obtain the Instructor’s Manual. 
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Some Problems Concerning Recurrence 
Sequences 


G. Myerson and A. J. van der Poorten 


There are questions about recurrence sequences that seem to crop up again and 
again. Plainly, though their answers are well known they are not known well. We 
endeavour to explain these answers in context so that they may become more 
widely known. The sequence 0,1, —1,2, —2,..., in which each integer occurs 
exactly once, is a recurrence sequence; that is, it satisfies a linear, homogeneous 
recurrence relation with constant coefficients, namely, 


a, = —Ga,_4 + a,—2 + a, 3. 


It is not hard to produce a recurrence sequence in which each integer occurs 
exactly twice, or for that matter exactly n times, for any given n—we will show 
how to do this later. Can there be a recurrence sequence in which each integer 
occurs infinitely often? In which every rational number occurs? Every Gaussian 
integer? We will present the theory that enables us to answer these and many 
other questions about the range of a recurrence. At the pinnacle of this theory is 
the beautiful Skolem-Mahler-Lech Theorem, which deserves to be more widely 
known. 

Let us first make some very general remarks about recurrence sequences. 
Suppose that the sequence dy, a,,... satisfies the relation 

Antn = S{Antn-1 7 TS_Ap 


for some complex numbers s,,...,5, and for h = 0,1,... . Taking h = 0, we see 
that a, is in the ring Z[a,...,a,_,,5,,-.-, 5,]. An easy induction argument shows 
that, in fact, all the terms in the sequence belong to this ring. Thus, the entire 
sequence belongs to a ring finitely generated over Z, the integers. 

It follows immediately that it is impossible for every rational number to occur in 
a recurrence sequence, as the rationals are not contained in any finitely generated 
extension of the integers. 

A little moré is true. If we are dealing with rational (or even algebraic) numbers 
then it makes sense to speak of a common denominator d, for the numbers 
Qy,...,@,-, and a common denominator d for s,,...,5,,. It is clear by induction 
(or immediate by what we say below) that then the numbers d,d“a, all are 
integers. 


1. THE SKOLEM-MAHLER-LECH THEOREM. To settle the other questions 
raised in our opening paragraph, we must invoke the theorem of Skolem, Mahler, 
and Lech; 


Theorem A. If ay, @,,... is a recurrence sequence, then the set of all k such that 


a, = 0 is the union of a finite ( possibly empty) set and a finite number ( possibly zero) 
of full arithmetic progressions. 
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Here, a full arithmetic progression means a set of the form {r,r + d,r + 2d,...} 
with 0 <r<d. To illustrate, consider the sequence given by the recurrence 
An+6 = 6a,,4, — 12a,,, + 8a,, with initial conditions (a),..., a5) = 
(8, 0, 9, 0, 8, 0); 
8, 0,9, 0,8, 0,4, 0,0,0, 16,0, 128,0,.... 

The set of k such that a, =0 is the union of the finite set {8} and the full 
arithmetic progression {1,3,5,...}; in fact, the sequence is given by a, = 0 if n is 
odd, a, = (n — 8)*2""-®/? if n is even. 

As so often happens, the proof of the theorem involves notions rather more 
sophisticated than its statement; so much so, that we can give only the barest 
sketch here. We will first tell the story of generalized power sums and make some 
introductory remarks about p-adic analysis, two of the important notions underly- 
ing the proof of the Skolem-Mahler-Lech Theorem, and of interest in their own 
right. The reader who is willing to accept the theorem on faith and eager to see the 
solutions of the problems posed above can read enough of the next section to 
understand the notation and then skip to Section 6 for the applications. The 
ambitious reader may then go on to the more advanced exposition written by the 
second author [vdP], or the detailed proof of Theorem A given by Cassels [Cas]. 

We note in passing that, for our purposes, 7,0,0,0,... is ot a recurrence 
sequence; the recurrence must hold from the start. The reader will experience no 
difficulty in extending the results given here to recurrences that only kick in after 
one or more terms of a sequence. 


2. GENERALIZED POWER SUMS. A generalized power sum ath), h = 0,1,2,... 
is an expression of the shape 


a(h) = Aj(h)a/, h=0,1,2,... (1) 


with roots a;, 1 <i <m, which are distinct non-zero quantities, and coefficients 
Ah) which are polynomials respectively of degree n; — 1, for positive integers n,, 
1 <i <m. The generalized power sum a(h) is said to have ordern = Ln, 

Set 


ym, 


s(X) =]]Q -aX)" =1-5,X-—-++ -s,X". (2) 
i=1 
Then the sequence (a,) with a, = a(h), h = 0,1,2,... satisfies the recurrence 
relation 
Apin = Siapan—] + see +S,Qp, h =0,1,2,... ° (3) 


To see this let FE: f(A) — f(h + 1) be the shift operator. Its properties include: 


(i) E"(f(h)) = f(n + h), 
Gi) E(f + g) = E(f) + E(g), and 
(iii) for all complex a and B, 


(E-a)(E-B)=(E-B)(E-a@) =E*-(at+ B)E + af. 
We have 
(E — a;)(A,(h) aj") = Aj(h + 1) att! —A(h)a}*! = (AA,(h)) a", 


where AA(h) = Afh + 1) — AA) is a polynomial of lower degree than that of 
A,. By induction, (E — a;)"(A,(h)a/‘) is identically zero. Let P be the operator 
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given by P = [17,(E — a,)”. It follows that 
P(a(h)) =P A/(h)at = Y) P(A(h)a}') = 0. 
j=l j=l 
But 
P(a(h)) = (E" —s,E""! — + —s,)a(h) 
=a(h+n)—s,a(h+n—1)—-:::—s,a(h). 


Thus generalized power sums correspond to the sequences satisfying the recur- 
rence relations (3). They also correspond to the Taylor coefficients of power series 
expansions of rational functions. Indeed, it follows from the above that there is a 
polynomial r(x), of degree less than n, so that the power series 


r(X) 
s( X) 


4,X" = (4) 
h=0 
is a rational function; to see this multiply by s(X) and note the recurrence relation 
(3). 

Conversely, suppose we are given a rational function (4) as above, and suppose 
deg r < deg s. A partial fraction expansion, together with the well-known identity 


_; *“{a4+/s-1 
GQ-y7=D[" 7) \vs 
n=0\ J 
yields 
r(X) me ~ ni h+j-1 
= y L Oe y [s ri; . | tx 
S(X) 24 jz aa oxy nao \izt jar \ J TI 


The combinatorial symbols displayed are polynomials of degree j — 1 in h, so the 
coefficients of X”, h = 0,1,2,... are indeed the values of a generalized power 
sum as described. 
Accordingly, results on generalized power sums are equivalent to corresponding 
results on the Taylor coefficients of power series expansions of rational functions. 
Later, we will need to deal with exponential polynomials 


a(z) = ¥A\(z) exp(z log a), (5) 


i=1 


the continuations to C of generalized power sums. These are the solutions of linear 
differential equations with constant coefficients. To be precise, with D = d/dz, (5) 
is annihilated by the differential operator [1”, (D — log a,)”. The order of the 
exponential polynomial (5) is n, as for the corresponding generalized power sum. 

It is plain that an exponential polynomial vanishes identically if and only if all its 
coefficients vanish. We see this readily by induction on the order. Indeed, a one 
term exponential polynomial A(z) exp(z log a) obviously vanishes identically if 
and only if A(z) vanishes identically. If (5) vanishes identically, then so does 
(D — log a,)a(z), which has order n — 1. By the induction hypothesis all its 
polynomial coefficients vanish; that is for all i the polynomials (D — log a, + 
log a;)A,(z) vanish identically. Then, with the exception of the constant coefficient 


700 SOME PROBLEMS CONCERNING RECURRENCE SEQUENCES [October 


of A,, all the polynomials A; must vanish identically. Our remark about a one 
term exponential polynomial guarantees that also that coefficient vanishes, and we 
are done. 


3. AN APPLICATION TO RECURRENCE SEQUENCES. Let us use the equiva- 
lence of recurrence relations and rational functions to produce a recurrence 
sequence in which each integer occurs exactly k times. We write c? for the block 
c,c,...,¢ of length k. The sequence 0, 1), —1, 2), —2) |. clearly con- 
tains each integer exactly k times. The corresponding power series is 


F(x) =k $e $y2hOd A yk oe Hy ST 4g Dy pe, 

which factors as 

x*(L tx tev tuk) — xk = —x2®O1)(1 + 202% — 3x +), 
This is a rational function, since 1 + 2x** + 3x** + --» =(1 — x?*)~*. Thus, the 
original sequence is a recurrence sequence. With a bit more algebra, we see 

k xh 

PO) = aaah) Dax t 2ah 2g ph 

so the sequence satisfies the relation 
Ansoke1 = Ansrk — 24nsns1 + 2QnaK — Ang1 + Ay, 

together with the initial conditions a, = --- =a,_, =0,a, = +: =a,,_, =1, 


Ay, = —]. 


4. AN INTRODUCTION TO p-ADIC ANALYSIS. The absolute value function 
defined on the integers has the following properties; 


(i) |x| > O for all x, 

(ii) |x| = 0 if and only if x = 0, 
(iii) |xy| = |x|-|y| for all x and y, and 
(iv) |x + y| < |x| + |y| for all x and y. 


There are other functions that have the same properties. Given any non-zero 
integer n, and any prime number p, we can write n = p’m with a and mi integers, 
a = 0, and p and m relatively prime. Moreover, this expression is unique. Define 
the function | |, by |n|, =p ~*. Thus, for example, 35|;, = 5, (36|, = 1, and 
36/3 = 5. If by convention we take |0| » = 0 for all p, then it is not hard to see that 
all the properties- of | | listed above hold for | |,, for each p. In fact, the last 
property holds in a stronger form, namely, 


(iv’) |x + yl, < max(Ixl,, lylp). 


We call | |, the p-adic absolute value. Thinking about convergence with respect 
to this absolute value leads to some peculiar-looking formulas. For example, for 
the geometric series with first term 6 and common ratio 7, the equation 


6 + 42 + 294 + 2058 +-:- = —1 


is a blunder in the usual run of things, but quite correct in the 7-adics. 

The p-adic absolute value is easily continued to a function on the rational 
numbers, enjoying properties (i) through (iv’); any rational x can be written as 
x = p*r/s with a, r, and s integers, and r and s both relatively prime to p. Thus, 


Z|, = 5, and |2|; = 9. 
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Any rational x has a unique decimal expansion x = ;_,,4;10 ~ with a; 
{0,1,..., 9}, the series converging in the usual absolute value. So, too, for each >. 
any rational x has a unique p-adic expansion x = );_,,a;p/ with a, in {0,1,. 

p — 1}, converging in the p-adic absolute value. For example, in the 7T- adics we 


have 


17 17 

—=7-*.—=W)7”’ =7°7(24+1°74+34+3°:74+3:7 4+... 

98 2 ) —] ( ) 
SST ea +34+3:74+3-7? 4+ 

where we have used the geometric series expansion —; = 1 + 7+ 77 + 


Now consider the sequence 1, 1.4, 1.41, 1.414, 1.4142, . of decimal approxima- 
tions to the square root of two. If m is less than n, then the mth and nth terms of 
this sequence differ by less than 10°”, a quantity which goes to zero as m 
increases. Such a sequence is called a Cauchy sequence (with respect to the usual 
absolute value). You can’t help feeling such a sequence ought to have a limit, but 
this one doesn’t—if you confine yourself to the rationals [Euc]. In analysis, it is 
useful for Cauchy sequences to have limits, so we embed the rationals in the larger 
set called the reals. Every real number has a decimal expansion, and every Cauchy 
sequence converges—we say the reals are complete. The details of the completion 
process can be found in many introductory analysis texts, for example [Gle]. 

Now consider the sequence 7,7 + 7*,7+ 77+ 7°,7+ 7° + 714+ 7%,.... In 
7-adic absolute value, the difference between the mth and nth terms in this 
sequence is |7?” + «+» +7?” ‘|, = 7-2”, which goes to zero as m increases. That is 
to say, this is a Cauchy sequence—if you view it 7-adically. It ought, then, to have 
a limit. It is not a geometric series, so it cannot have a rational limit. By a process 
formally identical to the construction of the reals, we embed the rationals in a 
larger set we denote Q,,, and call the p-adic rationals. Every p-adic rational has a 
p-adic expansion, and the p-adic rationals are complete. 

Back to the reals. There are non-constant polynomials which have real coeffi- 
cients but no real roots, for example, x7 + 1. If we extend the reals to a field 
containing a root of x* + 1, we obtain the complex numbers. Mirabile dictu, every 
non-constant polynomial with complex coefficients has a complex root. We say that 
the complex numbers are algebraically closed. The absolute value function is 
continued to the complex numbers by |a + bi| = (a? + b*)!/*. Mirabile squared, 
the complex numbers are complete (with respect to this absolute value). The 
important functions of calculus (rational, exponential, trigonometric, ...) can be 
continued to functions of a complex variable, and many problems about real 
functions become easier to handle in this larger domain. 

Back to the p-adic rationals. They are not algebraically closed. For example, if 
a in Q, were a root of x* — 7 = 0, we would have |a|7 = 7~'/”, but if a had the 
7-adic expansion a = L7_,,a;7/, we would have |a|; = 7~” with m an integer. 
We can embed Q,, in an algebraically closed field Q,, although the miracle of “add 
one number, get the rest free’ does not occur here. ‘We can extend | |, to Q »» but 
Q, is not complete. We can complete Q, to a field C,,, and this field is the p-adic 
analogue of the complex numbers; it is complete and algebraically closed. There is 
a rich theory of analytic functions on C,,, mirroring that on the complex numbers. 

This material can be found in less telegraphic form in [Kob]. 

What is really going on is this: The set of all Cauchy sequences forms a ring 
once we define the operations termwise; that the set is closed under the operations 
is a consequence of the rules (i)—(iv). One defines the field of reals (respectively 
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p-adic rationals, according to the particular valuation defining ‘Cauchy’) to be this 
ring with sequences ‘with the same limit’ identified. What that means is that we 
take the subset of null sequences, those converging to 0, and notice again by the 
rules (i)—(iv) that this set is a maximal ideal in the ring of Cauchy sequences. Then 
the quotient ring is a field. 

The ‘miracle’ of R and C actually is ather special. It turns out that if a field F is 
algebraically closed and if L is a subfield of finite codimension in F (in English: if F 
is a finite-dimensional vector space over some field L) then necessarily [F: L] = 2 
(compare [C: R] = 2) and L is an ordered field. That means that L is the disjoint 
union of three sets N, {0} and P with P closed under addition and multiplication 
and N = —P; P is of course the set of positive elements of L. It turns out that L 
can be ordered if and only if —1 is not a sum of squares. A complete orderable 
field is known as a real field and always is a subfield of codimension 2 in an 
algebraically closed field; for all this see for example [L], Chapter XI. By contrast, 
it is not hard to see that Q,, is not an ordered field. 


5. ON PROVING THE SKOLEM-MAHLER-LECH THEOREM. Recall that the 
terms of a recurrence are given by a generalized power sum, 


a(h) = r Adh)at, h=0,1,2,... (1) 


Given a positive number p, every A can be written uniquely as 
h=r+(p— 1)t, withr=0,1,...,p —2andt=0,1,2,... 

If we write a, ,(t) for a(h), we get 

a, A(t) = A (r+ (p — 1)t) a/ exp(t log a"). (6) 
Now it can be shown that there exist primes p such that the logarithmic and 
exponential functions can be continued to analytic functions on C,—more accu- 
rately, on regions of C,, large enough for the formula above to make p-adic sense 
for ¢ in a closed set D containing the integers. Then a, ,(t) is a p-adic analytic 
function on D for r = 0,1,..., p — 2. 

Suppose that there are infinitely many / such that a(h) = 0. Then there must 
be at least one r for which the analytic function a, ,(t) is zero for infinitely many 
integers t. Of course, there are complex analytic functions which are zero for 
infinitely many integer values of their argument; for example, sin 7z. This can 
occur because the integers are an unbounded set in C. Things are different in C,,, 
since the integers form a bounded set there; after all, |n|, < 1 for all integers n. It 
turns out that a function (whether complex or p-adic) analytic on a closed, 
bounded region and with infinitely many zeros in that region must be identically 
zero. Thus, a(r + (p — 1)t) vanishes identically for all integer ¢, and in particular 
a(h) is zero for all h in an arithmetic progression. This concludes our sketch of the 
proof. 

It is a little strange that Theorem A should force us to enter the realm of p-adic 
analysis. Actually that can sort of (but not really) be avoided. It turns out that p 
must be selected so that a? = a; mod p for each i. Then (6) has no more than 
n — 1 integer zeros (so certainly not infinitely many); otherwise it vanishes identi- 
cally [RvdP]. The trouble is that there seems only to be a p-adic proof for the 
bound. 


6. APPLYING THE SKOLEM-MAHLER-LECH THEOREM. So if there are in- 
finitely many h such that a(h) = 0 then there must be at least one r for which the 
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analytic function 
a, (t) = LAr + (p — 1)t) af exp(¢ log a?) 


vanishes identically. So, by our discussion at the end of §2, since the A, are not 
identically zero, the log a?~' cannot all be distinct. 

Indeed, the numbers a/?~' must coincide at least in pairs. Plainly p — 1 is not 
arbitrary and depends only on the roots a,. 

Moreover, we see that the original function 


a(z) = ), A,(z) exp(z log a;) 
i=1 
vanishes at all z =r+ t(p — 1) with t © Z. As an aside we mention that then it 
follows that a(z) must be the product of 


7 1 a (z-r) — 
-(2-") = =[e? —e?* 


sin 


with some other exponential polynomial. In that sense a recurrence sequences has 
infinitely many zeros if and only if it is ‘sinful’. 
So, in particular (taking / = p — 1, say) we have: 


Proposition 1. Jf a recurrence sequence vanishes infinitely often, then it vanishes on 
an arithmetic progression with a common difference 1 that depends only on the roots. 


Now suppose there is a number k such that a(h) = k for infinitely many h. Let 
b(h) = a(h) —k. Then b(h) = D"A(h)a! —k-1" is a generalized power sum 
with the same roots as a(h) (and, possibly, the root 1 if it was not already a root of 
a(h)), hence the same /-values as a(h), and b(h) is zero whenever a(h) = k. Thus, 
a(h) takes on the value k on an arithmetic progression with common difference 1. 

Now there are only / different complete arithmetic progressions of integers with 
common difference /. So we have established a principal remark of this note, 
namely, 


Proposition 2. The number of values that a recurrence sequence can take on infinitely 
often is bounded by some integer | that depends only on the roots. 


It follows immediately that there is no recurrence sequence in which each 
integer occurs infinitely often. 

Nor is there a recurrence sequence in which every Gaussian integer occurs. For 
suppose a, were such a sequence, and let ©7_,a,X" = r(X)/s(X). Then 


Now it is easy to see that the real part of a rational function is again a rational 
function, so &(a,) is a recurrence sequence, and it takes on every integer infinitely 
often. As we have seen, this cannot happen. 


7. MULTIPLICITY: A GOOD QUESTION. We restrict ourselves to recurrence 
sequences of integers. By the results just explained an integer recurrence sequence 
either takes the value 0 infinitely many times, in which case it has special 
properties that allow us to say it is degenerate, or only finitely many times. Is there 
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a bound p(n) so that a nondegenerate integer recurrence sequence of order n has 
at most p(n) zeroes? Of course any given non-degenerate integer recurrence 
sequence has a bound on its number of zeroes. Our question is whether there is a 
uniform bound for the multiplicity, depending only on the order of the sequence. 

It is obvious that (2) = 1. (Truly. Give this a few minutes thought.) The bound 
(3) = 6 is very much more difficult and has only been confirmed recently [Beul. 
The extreme case is 


Ay; = 2a, — 4a,,, + 4a,, ay =a, =0,a, = 1. 


Its six zeroes are dy = a, =A, = Ag = 4,3 =Asy = 0. 
For larger n there are not even any worthwhile conjectures. The problem 
deserves some computer time, say at least so as to guess (4) (which is > 9). 


8. RECURRENCE. The question, whether there is a recurrence sequence in 
which each rational occurs, was raised in Crux Mathematicorum in October, 1989. 

Proposition 2 was published in 1959 by Shapiro [Sha], and again some years 
later by Berstel and Mignotte [Ber]. The question, whether there is a recurrence 
sequence in which each Gaussian integer occurs infinitely often, was posed in Crux 
Mathematicorum in June, 1988, and repeated in October 1989. These sequences 
are recurrent in more ways than one! Indeed, Theorem A for recurrence se- 
quences of algebraic numbers was first proved by Mahler in the 30’s, based upon 
an idea of Skolem. Then, Lech published the result for general recurrence 
sequences in 1953. In 1956 Mahler published the same result, apparently indepen- 
dently (but later realized to his chagrin that he had actually reviewed Lech’s paper 
some years earlier, but had forgotten it). 

References not explicitly given here can be found in the survey [vdP]. 
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A Hyperbolic Plane Coloring and the 
Simple Group of Order 168 


Dana Mackenzie 


Monthly problem 10349, proposed by Raphael M. Robinson, reads: 


The hyperbolic plane is tiled with equilateral triangles meeting seven at each 
vertex. Can the tiles be colored with seven colors in such a way that no two 
tiles of the same color meet, even at a vertex? 


This paper will present the same solution of this problem in three different 
ways. The reason for this apparet surfeit of approachs, we shall see, is an unusual 
isomorphism between the two groups PSL.(Z,) and SL,(Z,). These groups are 
known to be isomorphic (see [2]), but I am not sure whether such a visual 
representation of this fact has been given before. We will use our. coloring 
procedures to explain the isomorphism between PSL.,(Z,) and SL,(Z,). 


Solution 1. The simplest solution follows along the lines of the argument in [3, pp. 
176-7] demonstrating the existence of the (2,3,7) tesselation of the hyperbolic 
plane. Figure 1 illustrates a 7-coloring of the 3-holed torus 7, such that no two 


Figure 1. A seven-coloring of a hyperbolic polygon that can be “‘sewn together” to form a three-holed 
torus. Gothic numerals 0, 1,...,6 denote the colors of the triangular faces. Italic numerals 1,2,...,14 
indicate the ordering of the polygon, used to determine which sides are sewn together. 
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triangles of the same color share a vertex. The three-holed torus’ is formed by 
identifying side 2i + 1 with side 2i + 6 (mod 14), if the sides are numbered as in 
Figure 1. (Note that some of the equilateral triangles are bisected by the lines 
which are “sewn together” by this procedure.) Lifting this coloring to the universal 
cover of 7, we obtain a 7-coloring of the hyperbolic plane with the desired 


property. a 


The trouble with Solution 1 is that, though it is concise, it gives no clue as to 
how the coloring in Figure 1 was generated. In addition, it relies on the somewhat 
serendipitous fact that the (2,3, 7) tiling can be generated by lifting a tiling of T,. 
In Solutions 2 and 3 we will explain how Figure 1 was derived, and we will assume 
only the standard facts that the (2,3,7) tiling exists and that its (orientation- 
preserving) symmetry group is 

l = <a, B, yla’ = B° = y* = aBy =e) 
The isometry a can be realized as the rotation by —27/7 about the point P in 


Figure 2; 8 as the rotation by —27/3 about the centroid of A POR; and y as the 
rotation by 7 about the midpoint of segment PQ. 


Figure 2. The tesselation of Figure 1, with the vertices labeled according to the method of Solution 2. 


For any tesselation (in particular, the one under consideration), we define F to 
be the set of faces, E to be the set of edges, and V to be the set of vertices. We 
define a coloring of faces, a highlighting of edges, and a labeling of vertices, 
respectively, to be maps from F, E, and V to a finite set. In Solution 2, we will 
derive the coloring of Figure 1 from a labeling L: V > Z, U {~} (where “‘o” is an 


'This torus also called the Klein quartic, seems to be in vogue at the moment. In the past year it was 
described in the popular press as “famous... of almost mythological proportions” ((1]), and inspired the 
new sculpture “The Eightfold Way,” by H. Ferguson, unveiled in November 1993 at the Mathematical 
Science Research Institute in Berkeley, CA ((5]). 
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abstract symbol interpreted in much the same way as in complex analysis). In 
Solution 3 we will derive the same coloring from a highlighting H: E > Z.. 


Solution 2. To begin, we define a homomorphism 


o:T > PSL,(Z,) 


by d(a) = E “1- o( B) = k “aL d(y) = ¢ “OL (These matrices satisfy 
the same identities as a, B and y, so we can be assured that such a homomor- 
phism exists.) Recall that the matrices \é ‘| € PSL.(Z,) act on the set Z, U {} 
by linear fractional transformations: 


a b ax + b 
| Ale = oe a q emod”7): 
Throughout this paper we will identify the matrices with the linear fractional 
transformations, for simplicity. 

Now we can describe our labeling L:V > Z, U {»}. Given A € V, let B be any 
adajent vertex and let C be the unique adjacent vertex such that A ABC is in the 
tesselation and is positively oriented. Then there exists a unique isometry y € [ 
such that y(P) =A, y(Q) = B, and y(R) = C. Then define L(A) = [dC y)(). 

A much simpler way to compute the labeling L is to apply the following 
recursive procedure. Begin by labeling the central vertex P “” (ie. setting 
L(P) = ©) and labeling the seven adjacent vertices “0” through “6,” beginning 
with Q and proceeding counterclockwise. Extend to a labeling L(y) of all the 
vertices in the (2, 3, 7) tesselation by applying the following rule: 


Rule 1. Jf the vertices of A ABC are already labeled, with L(A) = a, L(B) = b, 
L(C(= c, where a,b,c are distinct elements of ZU {x}, and if a ABC is an 
adjacent triangle, set LD) = d, where d is the unique number in Z, U {~} such that 
(a, b,c, d) = —1(mod 7). 

Here (a, b, c. d) denotes the cross ratio. 

The uniqueness of d is a standard fact about cross ratios, but there is another 
reason that L may not be well-defined. Since each vertex belongs to seven 
different triangles, and each of these is adjacent to three different triangles, Rule 1 
could give us many conflicting instructions about how to label a given vertex. The 
easiest way to show that it does not, in fact, lead to “conflicting instructions,” is 
to verify that our first, non-recursive procedure does give a labeling that satisfies 
Rule 1. - 


Lemma 1. L is a well-defined labeling on V and satisfies Rule 1. 


Proof: We need to show that L does not depend on the choice of the adjacent 
vertex B. Suppose we had chosen C (the next vertex, proceeding counterclockwise). 
thereby using the adjacent triangle A ACD instead of A ABC. Then the isometry 


mapping A POR > A ACD is x' = xa™"', so 


L(A) = [6 x')](~) = [¢( xa) (%) = [oC x) ] [C071] (~) = [60 x) 1), 


which agrees with the previous definition. Proceeding by induction counterclock- 
wise about the vertex A, we conclude that L(A) is independent of the choice of 
the adjacent vertex B. 
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To prove Rule 1 holds, we suppose that A ABC and A ABD are adjacent 
triangles in the tesselation. Without loss of generality, suppose that A ABC is 
located clockwise from A ABD in the cycle of seven triangles with vertex A. Let 
y: A POR > « ACB, then yB~': A POR > ACBA, xB: APQR > ABAC, and 
ya 'B: a DAB. Hence 


L(C) = [6(x)][¢(87')](&) = [oC x)] (0), 
L(B) = [4(x)][¢( B)](~) = [éCx)]1Q), 
L(D) = [4( x)][¢(e7)][4( B)][¢( 8) ]() = [Cx] (2). 


Since #( yx) is a linear fractional transformation, it preserves cross ratios. Thus 
(L(A), L(B), L(C), L(D)) = (~,1,0,2) = —1(mod7). a 


Now we explain how to proceed from the vertex labeling L to a face coloring. 
Our coloring will be based on the fact that for any four distinct elements 
A,,...,4, © Z, VU {~}, the set of all values of the cross-ratio (4,1), @¢(2)) 4x 3)> Ina) 
as 7 ranges over all permutations is either {2, 4, 6} or {3, 5}. In the latter case, the 
subgroup of permutations which fix the cross-ratio has index 2 and hence must be 
the alternating group A,. Thus, in particular, if (a,b,c,d) =5 (mod7) then 
(c, a, b, d) = (b,c, a, d) = 5 (mod7) as well, while, for example, (b, a,c, d) = 3 
(mod 7) | 

We will define a triple (a, b,c) to be positively oriented if and only if there is a 
positively oriented triangle A ABC with labels L(A) = a, L(B) = b, and L(C) = c. 
Obviously this will hold if and only if there exists f © PSL,(Z,) such that 
f(a) = ~, f(b) = 0, and f(c) = 1. A linear fractional transformation which accom- 
plishes this is 


_[d-b (b-a)c 
Ss ld-c (c—d)b 


This transformation can be represented by a matrix of determinant 1 if and only if 
det g is a quadratic residue. (If det g = A’, then (1/A)g is a matrix of determinant 
1 which represents the same transformation.) Thus we may define a positively- 
oriented triple in a purely number-theoretic fashion: if a, b,c # ~, then (a, b, c) is 
positively oriented if and only if det g = (a — bXb — cc — a) is a quadratic 
residue (mod 7). 

Similarly, one can check that (a, b, ©) is positively oriented if and only if (b — a) 
is a quadratic residue (mod 7). 

Here is our rule for coloring the faces of the (2, 3, 7) tesselation. 


Rule 2. Define C(A ABC) = n if and only if 
{L(A), L(B),L(C)} cS, or {L(A), L(B), L(C)} CSS, 

where S, = {n,n + 1,n + 3,}. (As usual, all the additions are modulo 7. S¢ 
denotes the complement of S,, in Z, U {}.) 

An alternate definition for the sets §$,, and the reason for choosing these 
particular sets, is given in the next lemma. 
Lemma 2. Let &@ be the following collection of subsets of Z, U {%}: 

@ = {{a, b,c, d}\(a, b,c, d) = 3 iff(a, b,c) is positively oriented} . 

Then @ = {S,, S°|\0 <n < 6}, and @ is a (3, 4, 8) Steiner system. 
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(For the definition of a Steiner system, see, for example, [4].) 


Proof: It is easily checked that the sets {0, 1,3, } and {2, 4,5, 6} are in @, and it 
follows for the remaining S,’s, since they are obtained from these two by a 
translation (mod 7). 


Next we note that for each of the (5 = 56 triples {a, b,c} C Z, U {>} there is a 


unique d which satisfies the condition in the definition of &. This shows that @ is 
a Steiner system. Since each set {a,b,c,d} € &@ can be generated from four 
different triples, there are 14 elements in &. Thus the 14 sets of the form S, and 
S* must be all the elements of &. 

The fact that @ is a Steiner system is required for the coloring C defined in 
rule 2 even to make sense: we need to know that for each set of three vertex labels, 
there is only one and only one color that can be assigned to that set. Now we 
confirm that this coloring solves the problem stated at the beginning. 


Theorem 3. C defines a coloring of F such that no two triangles of the same color 
share a vertex. ' 


Proof: Suppose C(A ABC) = C(A ADE). Then {L(A), L(B), L(C)} and 
{L(A), L(D), L(E)} are either both in S$, or both in S‘, since they are not disjoint. 
This means that the sets have at least two elements in common. Since all the 
vertices adjacent to A have distinct labelings (a simple consequence of the 
construction in Lemma 1), this means that the original triangles also have two 
points in common. Suppose, without loss of generality, C = D. If B = E also, we 
are done. Otherwise, let a = L(A), b = L(B), c = L(C), e = L(E). By Rule 1, 
(a,c, b,e) = —1. But from Lemma 2 and the fact that {a,c,b,e} = S, or SS, it 
follows that (a,c, b,e) = 3 or 5. By contradiction, we conclude that A ABC = 
A ADE. a 


Solution 3. We can describe the coloring of Figure 1 in yet another way, using a 
highlighting of edges instead of a labeling of vertices. Proceeding counterclockwise, 
highlight the seven central edges of the tesselation in colors 0,1,...,6. Extend this 
to a highlighting of E by the following rules: 


Rule 1. The three edges of any triangle must be highlighted in colors that correspond to 
three collinear points in the projective plane illustrated in Figure 3. (Thus, for example, 
if two edges of a triangle are highlighted 0 and 1, then the third edge must be 
highlighted 3.) 


Rule 2. The seven edges incident at any vertex A must be highlighted in colors 
b(0), d(1),..., 6(6), proceeding counterclockwise around A, where @ is an automor- 
phism of the projective plane. For example, as Figure 4 illustrates, if three consecu- 
tive edges are highlighted 0, 6,5 (proceedings counterclockwise about a point A), 
then the remaining edges must be highlighted 2, 1,3, 4, in that order. 

Finally, note that the lines in our projective plane have been labeled in such a 
way that 


{a,b,c} are collinear = {a,b,c,~} =S_, 
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3 y 6 


Figure 3. The projective plane of order 7. 


4 6 Z 1 4 


Figure 4. An automorphism of the projective plane in Figure 3. 


where S, is one of the sets defined in Solution 2. This brings us to the final 
coloring rule: 


Rule 3. If the edges of a triangle are highlighted a, b, c, then color the triangle in color 
n, where {a, b,c, ~} = S,. 


Though these rules are somewhat clumsier than the rules of Solution 2, the 
reader can verify that they lead to precisely the same coloring. 


THE ISOMORPHISM BETWEEN PSL.(Z,) AND SL,(Z,).. It is more or less 
obvious, from Solution 2, that the symmetry group of the (2, 3, 7) tesselation of the 
torus 7; (not the entire hyperbolic plane) “should be” PSL,(Z,). (identify each 
isometry y by the labelings of the vertices y(P), y(Q), y(R)). Likewise, the 
symmetry group “should be” SL,(Z,) (which is the symmetry group of the projec- 
tive plane in Figure 3). Thus a consequence of Solutions 2 and 3 is that PSL,(Z,) 
and SL,(Z-,) are isomorphic. The goal of this section is to construct the isomor- 
phism as explicitly as possible, without the aid of figures. Why? To prove that the 
identification with PSL.(Z-,) is one-to-one and onto depends on examining Figure 
2 to make sure that no two triangles have the same vertex labelings and that each 
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vertex labeling actually occurs. Similarly, the proof of the second identification 
rests on examining the edges meeting at all 24 vertices to make sure that each 
automorphism of the projective plane occurs once and no more than once. Though 
this is easy enough to do, I would contend that it is a very uninformative, “brute 
force” proof. 


Lemma 4. |PSL,(Z,)| = |SL;(Z,)| = 168. 


Proof: Any element x € PSL,(Z,) is determined uniquely by y(>), y(0), and 
y(1). There are 8 choices for y(~); 7 choices for (0), and only 3 choices for y(), 
since only half of the 6 elements of Z, U {} which are distinct from y(%) and 
v(0) yield a positively oriented triple. Thus |PSL,(Z,)| = 8-7-3 = 168. 

Any element x € SL,(Z,) is also uniquely determined by three values. There 
are 7 choices for y(1,0,0) (since y(,0,0) # (0,0,0); 6 choices for y(O, 1, 0); 
and 4 choices for y(0,0,1). since y(0,0,1) 4 Span{ y(, 0,0), y(0, 1, 0)}. Thus 
ISL(Z,)| = 7-6 +44 = 168. 7 


Lemma 5. SL,(Z,) = Aut[(Z,)°]. 


Here we are considering (Z,)° only as an additive group and “forgetting” the 
vector space structure. This lemma is obvious from the fact that any set of three 
basis vectors of (Z,)° as a vector space is also a set of generators for (Z,)° as a 
group, and conversely. 

We will now define a new operation, ©, on Z, U {~}, and show that (Z, U 
{0}, ®) is isomorphic to (Z,)°. In fact, even more is true: (Z, U {>}, ®@,+) isa 
field, with the ordinary addition (mod7), +, serving as the multiplication in the 
new field! It would be interesting to know whether there exist any other algebraic 
structures like this, where - distributes over + in Z,, which in turn distributes over 
® in Z, U {x}. 

The definition of © is motivated by the coloring procedures given in Solutions 2 
and 3 above: if a and b are the labels of two adjacent vertices in Solutions 2 , then 
a ® b will be highlighted color of the edge joining them in Solution 3. 


Defintion 6. If a #0 (mod7), let y(a) = (a/7), the Legendre symbol of a 
(mod 7). (This simplifies some of the typography below.) That is, y(a) = 1 if a isa 
quadratic residue modulo 7 (i.e. a = 1,2, or 4), and y(a) = —1 otherwise. 


Definition 7. [f a #b © Z,, then a ® b =c, where c is the unique element of Z, 
such that (a, b,c, ©) = 5*°-(mod 7). 

In addition, we define ag®@a=~w~andae®wxw=n Ga =a.Notethata 0@b=b 
®a, since y(a — b) = — yx(b — a). 


The proof that Z, U {~} is a field rests on two fairly simple lemmas, whose 
proofs we will leave to the reader. 


Lemma 8. Define % = {((n,~),(n + 1,n + 3),(n + 2,n + 6),(n + 4,n + 5}. 
Then any ordered pair (a, b) such that b — a is a quadratic residue is an element of a 
unique set . Also, for any two distinct ordered pairs, (a,b) and (c,d) in &, 
(a, b,c, da) = = 5 (mod 7). In particular, it follows that 


(n+1)@(n+3) =n,(n + 2) © (n+ 6) =n, and(n + 4) © (n +5) =n. 
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Lemma 9. If a,b € Z,, a # b, then 


5a + 3b if y(b —a) = 1, 
a®@®b= , (1) 
3a + 5b if y(a — b) = 1. 

In general, thw characterization of © in terms of cross ratios is more useful 
than equation (1), but Lemma 9 simplifies calculations of @—for example, if we 
were trying to construct the edge-highlighting of Solution 3 directly from the 
vertex-labeling of Solution 2. We also need equation (1) in the proof of Theorem 
13. 


Theorem 10. (i) If A € Z, then(a +A) ®(b+A)=(a@b)+A. 

(ii) If y() = 1 then (a) © (ub) = pla © Db). 

(iii) Z, U {~} is a field, with addition defined by © and multiplication defined 
by + (mod 7). 


Proof: The proofs of (i) and Gi) are obvious from the properties of the cross ratio. 
Property (ii) is not, of course necessary for Z, U {>} to be a field, but we will use 
this property below. Note, incidently, that (ii) is false if y() = —1. 

To prove (iii), it remains to verify the group axioms for ®. The only one that is 
not obvious is associativity. Of course, this could be checked by “brute force,” but 
again it is more interesting to present a proof which makes use of the prtoperties 
of the Legendre symbol and cross ratio, Here is a sketch of the argument, with 
details left to the reader. 

First, verify that associativity holds when one term is repeated; that is, for 
any a,b&Z,,(a®@a)@b=a @(a@b) =D. The fact that (a ©a) © b = b is 
immediate from Definition 7. The second equality follows from the identity 
yv(a ® b — a) = y(a — b), which can be checked using Lemma 9. Next, using 
Lemma 8 and Lemma 3, verify that for any distinct a,b,c © Z., 


a®b=ce@d#(a,b,c,d) = 5xXO-9xE-)xl@—2) (mod 7). (2) 
Finally, using (2), we can prove that associativity holds when all three terms are 


distinct. This is trivial if one of the terms is ©. Otherwise, if a,b,c are distinct 
elements of Z,, 


(a ®b) @c =(c © a) @c = 4d, where (a, b,c, d) = 5XOC?~ Ox“) xla—e) 
and 

a®(b@c) =a @(a @e) =e, where (b,c, a,e) = SX )XE-O9XO—4)_ 
Hence (a, b, c, d) = (b,c, a, e) = (a, b, c, e), and it follows that d = e. | 


Corollary. As a group, Z, U {~} = (Z,)°, and hence Aut[Z, U {~}] = SL,(Z,). 


Now we will construct an explicit isomorphism 
7: PSL,(Z,) — Aut[Z, U {~}]. 
The first thing to notice is that we cannot simply let the elements of PSL,(Z,) act 
on the symbols in Z, U {%} in the normal way, because some linear fractional 
transformations do not map the identity ~ to itself, and hence cannot be group 
automorphisms for the operation ©. The correct idea again comes from our 
colorings of the (2, 3, 7) tesselation. 

When an isometry p acts on a triangle whose vertices are labeled ~™, a, b, the 
image (by the construction in Solution 2) will have vertices labeled [( p)I(), 
[o( p)Ka), and [¢(p)\(b). For simplicity, in Figure 5 we have identified the 
isometry p with ¢(p). We claimed above that if a and b are the colors of two 
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0 @ b a@b p()@p(b) p(a) @p(b) 


oa p() p(~)ep(a)  p(a) 


Figure 5. The effect of an isometry p on the vertex ;abeling of Solution 2 and the edge Highlighting of 
Solution 3. 


adjacent vertices in Solution 2, then a © b is the color of the edge joining them in 
Solution 3. Thus we obtain the edge-highlighting illustrated in Figure 5. In order 
for the highlighting to be consistent with the action of the isometry p, p must act 
in the following way on the edge highlighting: 


A,(a) =A,(© © a) = p(~) ® p(a). (3) 
Then our conjectured isomorphism 7 is given by 
T(p) = Ay. (4) 


Our task now is twofold: first, to confirm that for any p € PSL,(Z,), the map 
A, defined in (3) is indeed an automorphism of Z, U {»}; second, to confirm that 
the map 7 defined in (4) is an isomorphism. 


Lemma 11. Jf p(x) =x +b, p(x) = ax (where x(a) = 1) or p(x) = —x™', then 
A, is an automorphism of Z, U {=}. 


Proof: The first two cases follow from Theorem 10. The third case, p(x) = —x7', 
follows from the following computation. For any x, y © Z,, such that x # y: 
(x,y, x ® .%) = §xX0~*) 

=> (x, y,©,x By) = SX) 

=> (—x71, ~y-1,0,—(x ®y)-) = 5x(e-Y) = 5 x(t") x(-* x") 

= (—-x"') ®(-y') =0@ (-(x ®y)) (by equivalence (2)) 

=> A,(x) ® A,(y) = A,(x ®y). 
The cases where x = y or where x or y equal ~ are trivial. Likewise, the proof that A, 


is a bijection is straightforward. 


Lemma 12. A, ,. =A,,A,,. Hence 7 is a homomorphism. Moreover, A, © Aut Z, 
U {~} for all p © PSL,(Z.). 7 


Proof: We leave to the reader the verification that A, ,. =A, A,, if p, has one of 


the three forms in Lemma 11. Since these three types of linear fractional transfor- 
mations generate PSL,(Z,), the same statement holds for all p,, p, € PSL,(Z,). 
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Similarly, since compositions of automorphisms are automorphisms, the second 
assertion also follows immediately. = 


Theorem 13. 7 is an isomorphism. Hence PSL,(Z,) = SL,(Z,). 


Proof: Even after all this effort, it’s still not quite trivial! First we show that 7 is 
injective. Suppose that A,(x) =x for all x. Then 6(~) ® f(x) =x and d(~) ® 
o(y) = y for all x, y. Adding, we conclude that d(x) © d(y) =x © y for all x, y. 
Adding x ® @(y) to both sides, we have x © $(x) = y © @(y) for all x, y. Hence 
there is a constant A such that x © $(x) =A for all x, or d(x) =x © A. We 
claim this cannot be a linear fractional transformation unless A = ». Otherwise, 
from Lemma 9 we have 


_ {5x + 3A if x(A —x) = 1 
Ox) = V3.4 SA if y(A —x) = 1. 


But since #(x) agrees with the linear fractional transformation 5x + 3A for the 
three values of x such that y(A — x) = 1, and since linear transformations are 
determined by their values at three points, then #(x) = 5x + 3A for all x. This 
contradicts the equation above. Hence A = ~, and $(x) =x © ~ =x, so that @¢ is 
the identity map. 

The very last step is trivial. Since 7 is one-to-one, and by Lemma 4 the number 
of elements in its domain and range are equal, 7 is onto. Hence 7 is an 
isomorphism. | 


FINAL REMARK. We have seen that the 7-coloring of the hyperbolic plane in 
Figure 1 is intimately connected with the existence of an isomorphism between 
PSL, and SL,(Z.,). But this isomorphism is highly unusual: in fact, it is the 
only isomorphism between finite simple groups of Lie type listed in [2]. It would be 
interesting to investigate whether d-colorings of the (2,3, d) tesselation for d > 7 
exist; whether they can be associated with vertex and edge colorings as we have 
done here; and, if so, whether any interesting group-theoretic consequences may 
follow. 
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Cosine Products, Fourier Transforms, 
and Random Sums 


Kent E. Morrison 


1. INTRODUCTION. The function sin x/x is endlessly fascinating. By setting 
x = 7/2 in the infinite product expansion 


= Il COS => (1) 


one gets the first actual formula for 7 that mankind ever discovered, dating from 
1593 and due to Francois Viéte (1540-1603), whose Latinized name is Vieta. (Was 
any notice taken of the formula’s 400th anniversary, perhaps by the issue of a 
postage stamp?) From the samples of a function f(x) at equally spaced points x,,, 
n & Z, one can reconstruct the complete function with the aid of sin x /x, provided 
f is “band-limited” and the spacing of the samples is small enough. This is the 
content of the Sampling Theorem, which lends its name to sin x /x as the sampling 
function. Its importance in signal processing, where it is also known as sinc x, is 
the result of its Fourier transform being the characteristic function of the interval 
[—1, 1] Gnodulo a scalar factor). 

In Section 2 we prove the infinite product expansion for sin x/x and derive 
Viéte’s formula. In Section 3 we transform the product expansion with the Fourier 
transform and use convolution and delta distributions to prove it in a way that 
reveals a host of similar identities. Section 4 puts these identities into a probabilis- 
tic setting, and in Section 5 we alter the probability experiments in order to make 
connections between infinite cosine products, Cantor sets, and sums of series with 
random signs, particularly the harmonic series. This leaves us with some interesting 
unsolved problems and conjectures for further work. 


2. AN ELEMENTARY PROOF. Repeated use of the double angle formula for the 
sine shows that 
x x 
sin x = 2sin= cos— 
2 2 


As x x x 
= 4S8IN— COS— Cos— 
4° ° 4° 2 


x n x 
= 2” sin— s—}. 
(Eh esse] 
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Figure 1. Graphs of sin x/x and cos x/2cos x/4cos x/8. Where both graphs are visible, sin x/x is 
nearer the x axis. 


But 


lim 2” si - 
lim 2" sin5, = %, 
thereby proving the identity. See Figure 1 for an indication of how quickly the 
product converges. 

Let x = 7/2, make use of the half-angle identity, and there you have Viéte’s 


formula for 77, 
2 v2 y2+ v2 V2+ 2+ 2 


7 2 2 2 _ @) 


At this point the cosine identity could remain an isolated curiosity of historical 
interest, relegated to the ends of exercise sets in textbooks. In fact, it is just the 
first of an infinite family of cosine product identities for sin x /x. 


3. THE FOURIER TRANSFORM AND MORE IDENTITIES. For a complex 
valued function f(x) defined on the real line, the Fourier transform puts together 
f as a continuous linear combination of the “pure” oscillations e'®* in which the 
coefficient in front of e’®* is denoted by f(w). Thus, 


f(x) =f flwyel* de. (3) 


The function f is the Fourier transform of f and the integral above is a 
description of how to get back f from f and is actually the formula for the inverse 
transform. How do we get f from f? That is given by this integral: 


s 1-2 | 
flo) = 5] f(xjet de. (4) 


Of course, the proofs of these relationships involve hypotheses on the functions so 
that the integrals make sense, but they can be extended beyond the realm of 
ordinary functions to generalized functions or distributions. We need more than 
ordinary functions in order to make sense of the Fourier transform of a sine or 
cosine. 


4 
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Notation: we also write the Fourier transform of f as A(f) and the inverse 
transform of ¢ as F '(@). 
Consider cos bx, which by Euler’s Identity may be written as 


cos bx = $(e* + e7/*), 
This shows the function written as a linear combination of just two of the functions 
e'®* for w = b and w = —b. The coefficients appear to be 1/2, but if we use them 


in the integral form with all other coefficients zero, then we cannot represent the 
cosine function. Instead, we must regard the coefficients as point masses at b and 
—b..Therefore, the Fourier transform of cos bx is (1/2)(6, + 6_,), where 6, 
denotes the Dirac delta distribution or point mass at the point b. All of this can be 
made rigorous, but at the expense of some long development in graduate level 
analysis. The approach here is at about the level of a second year course in 
engineering mathematics. 

In addition, the Fourier transform behaves nicely on a product of functions and 
turns it into the convolution of the transforms: 


fg =f 8. (5) 
For two functions ¢(w) and W(@), the convolution ¢ * & is defined by 
(d*y)(w) =f o(a)¥(w— a) da. (6) 


Again, we must extend convolution beyond the realm of functions. In particular we 
need convolutions of delta distributions and for them we can easily show that 6, 
behaves as the identity for convolution 
55 * P= (7) 
and that 
8, * by = a4: (8) 
Now back to the cosine identity. Let f(x) = II?_, cos(x/2*) and let f, be the 
nth partial product. The Fourier transform of f,, is 


‘ nd 
f,, = I 5 (812" + 5_y jan). 


The asterisk in front of the product sign indicates a repeated convolution of the 
factors. Expanding for n = 3 we see that 


fy = 3(8_7/g + 65g ++ 87/8). 


Likewise 


where B, is the set of 2” equally spaced numbers from —1 + 1/2” to 1-1/2” 
with spacing 2/2” = 1/27"). | 

The sequence of measures f,, converges to the uniform density on [—1, 1] of 
total mass 1, which we can write as (1/2) y,_,, ;, dw. The inverse transform is easy 
to compute: 
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The spectrum of (sin x)/x is uniform in the interval —1 < w < 1. This means 
that sin x/x is a continuous linear combination of the “pure” harmonics e’®* with 
the same weight of 1/2 for each w € [—1, 1]. 

With this proof we have a way to generate a family of similar identities. Let us 
put point masses at 3” equally spaced points from —1+ 1/3” to 1 — 1/3” with 
spacing 2/3”. Such a measure is the convolution *ITj_, 3(6_,/3« — 69 + 6, p3k- 
Applying the inverse transform 


{1 1 2X 
and taking limits gives us the infinite product identity 
ed 2x sin x 
IT 5 [1 + 2cos$] = . (9) 
n=1 3 3” 


x 

Let us use the positive integer p as the base (we have just seen p = 2 and 

p = 3). The first measure f, is the sum of point masses at p points equally spaced 
from —1+ 1/p to 1 — 1/p with spacing 2/p. 


~ 1 

f; = —(61-p + O3-p + Os—p + .ee +591] (10) 
P Pp Pp Pp Pp 
1 Po! 


(11) 


| 
| 
M 
m 
~ 
a 
~ 


We let 


and one can see that f, consists of p” point masses equally spaced from 
—1+1/p" to1-—1/p” with spacing 2/p”. Taking the inverse transform we see 
that 


a nj p-1 
F (fix) = TE = LU exp((2/ + 1 - p)ix/p*). 
k=1 P j=9 
Rewriting the exponential as cosines and taking limits gives the general identities. 


There is a slight difference in the form depending on the parity of p. For p 
even 


eee | mx sin x 
IT - = 2cos—| = . (12) 
k=1 P 1<m<p-1 Pp x 
m odd 
For p odd 
ed mx sin x 
T]-\1+ 2cos—] = ; (13) 
k=1 P 1<m<p-1 P x 
m even 
For p = 6 the identity takes the form 
| x 3x 5x sin x 
if S 2 cose + 2 cosz; +- 2 cose; => (14) 
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For p = 7 the identity takes the form 


ae | 2x 4x 6x sin x 
[] =|1 + 2cos— + 2cos— + 2cos— 


= 15 
kal 7 qk 7k qk x ( ) 
For larger p fewer terms in the product are needed for the same degree of 
accuracy in the approximation to sin x/x. In fact, by letting p go to infinity the 
first factor alone approaches sin x/x and provides a novel derivation of a well- 
known result. I leave it to the reader to work it out. 


4. PROBABILISTIC INTERPRETATION. Mark Kac, in his delightful and now 
classic Carus monograph [2], proves the first cosine identity (1) in a way that is 
equivalent to the one we have outlined, although he does not explicitly use the 
Fourier transform, delta functions, and convolution. He then turns the identity 
into a question of probability, which for him was the leitmotif of his mathematical 
work. 

The original product identity (1) arises from the following experiment. Flip a 
fair coin repeatedly. Beginning with 0, add 1/2 if the result is heads and subtract 
1/2 if the result is tails. On the next toss add or subtract 1/4; on the next add or 
subtract 1/8, and so on. What is the distribution of the sums over the probability 
space whose elements are the countable sequences of coin tosses? Clearly the sums 
are distributed uniformly between —1 and 1. 

Let s, denote the nth partial sum. It is a sum of independent random variables 
a, + a, + ++: +a,, where a, has the probability distribution (1/2)(6, ,.« + 6_, 2+). 
The probability distribution of a sum of independent random variables is the 
convolution of the respective distributions of the random variables. Therefore, s, 
has the distribution 


ni ] 
“Tl 5 (812 + 5_1/2*) = sr ( 81-2" + ee +6 7n_1 ) 


2” 2” 


The inverse Fourier transform of a probability measure is called its characteris- 
tic function. Thus, the characteristic function for the distribution of s, is the 
product I1?_, cos x/2*. In the theory of probability and statistics, characteristic 
functions are a powerful tool. Typically computations are done with characteristic 
functions in order to draw conclusions about distributions of random variables as 
in the standard proof of the Central Limit Theorem. Here, however, we have 
inverted the relationship in order to compute with the probability measures and to 
get results about the characteristic functions. 


5. RELATED PRODUCTS: EXAMPLES AND CONJECTURES 


5.1. Coin tossing and, Cantor sets. The Cantor set K is the set of points between 0 
and 1 whose ternary expansion has no 1’s in it. So z is in K if z = Lj_, t,3“, 
t, € {0,2}. Define K, to be the set of elements of K that have the form 
L7_,t,3~*, and define a probability measure supported on K,, 


1 
My=sz LL 6. (16) 


K, has 2” elements so py, is equally distributed on K,,. The sequence (y,,) has a 
limit 44, which can be described as assigning the following limit as the measure of a 
set E: 

#E 1 K, 
an (17) 


w(E) = lim 
noo 
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The measure yw is also the Lebesgue-Stieltjes measure of the Cantor function. The 
Cantor function is continuous, non-decreasing, and has derivative zero on the 
complement of the Cantor set. Thus it defines a measure supported on the Cantor 
set, which is precisely the measure pw defined in (17). 

What is of interest in this note is that yw, is the finite convolution product 


nj 
u,=*]] 5 (80 + 85/3«). (18) 
k=1 
Consider the experiment of tossing a fair coin. On toss number x let 


— [0 heads 
“K-19 73% tails 


Let s, = L4-14,. Then s, is equally distributed over K,. The characteristic 
function for the distribution of s, is TI7_,(1/2)( + e*"/*’). Define 


an | oo 
f(a) = TL 5 +e"). (19) 
k=1 


(One checks easily that the product is convergent.) Then f = p, the Cantor 
measure, but is it possible to characterize f in any other way? 
This leads us to look at the related infinite product IT?7_, cos2x/3*. Because 


2x ] 


the probabilistic interpretation is clear: add or subtract 2/3* on the kth toss with 
equal probability. Let s, be the sum of the first n values. What is the distribution 
of s, and what is the distribution of s = lim, _,,, s,? The exercise of expanding and 
plotting the values of s, lead one to suspect that s is distributed “uniformly” over 
the Cantor set constructed from [—1,1] by successively removing middle thirds. 
That is easy to prove, as follows. 

Define the affine map of [0, 1] to [—1, 1] by z > 2(z — 1/2). Let z = 4,3‘, 
t, € {0, 2}, be a point in the Cantor set. The ternary expansion of 1/2 is ©3~*, and 
so 2(z — 1/2) = L(t, — 1)3-*. The coefficients 2(t, — 1) are either 2 or —2 with 
equal probability. 

This shows that the infinite product IT7_, cos2x/3* has Fourier transform 
equal to the Cantor measure on the Cantor set constructed from [—1,1] by 
removing middle thirds, but it does not give us a closed form like sin x /x. It would 
be most surprising if there were any simpler description of II%_, cos2x/3*. In 
Figure 2 is a plot ofthe partial product with n = 8 and 0 <x < 100. (The function 
is even.) Over this range the infinite product is indistinguishable from the eighth 
partial product. The self-similarity of the Cantor set at smaller and smaller scales 
appears to be reflected in the self-similarity of the graph at higher and higher 
frequencies. 


5.2. Harmonic Series with Random Signs. We have been looking at the sums of 
series of the form 


de cy (20) 
k=1 
where ¢, is randomly chosen to be 1 or —1 with equal probability. Rademacher 


proved that if Lc? < ©, then the sum converges with probability one on the 
probability space Q = {—1, 1}%. (QO can be identified with the unit interval and the 
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Figure 2. Graph of Il, COs FE 


probability measure with Lebesgue measure by using binary representations of 
number: in the interval.) In [2] Kac gives the proof of this theorem due to Paley 
and Zygmund. It is also a theorem that the series diverges with probability one if 
“cz = %. Let us consider the random harmonic series 

~ t 

LS (21) 
k=1 
which converges almost surely by Rademacher’s result, with the goal of under- 
standing the distribution of the sums. This means we want to understand the 
distribution of the random variable s defined on (). If we let s, be the partial sum, 
also a random variable, then the probability distribution of s, is the measure 


nj 
uw, = *[] 5 (Oy: + 5_1/x) (22) 
k=1 
and its inverse transform is 
n x 
F'(p)(x) = [] cosy. (23) 
k=1 


The product converges uniformly on compact sets as n — ©, and so it is plausible 
that the sequence yw, converges to a probability measure yu that is the distribution 
of the random variable s. There is, however, a fair bit of analysis to make this 
rigorous. Assuming that the analysis can be made rigorous, then the plot of the 
Fourier transform of the infinite product ITZ_, cos x/k will show how the sums 
are distributed. Let us call this function @(w). Then 


* x 
(0) = a TT cos )(w) (24) 
l po x 
L p« _ = * 
— 5 J (cos wx + isin ox) |] cosy dx (26) 
1.0 °° x 
= — | cos wx | [ cos— dx. (27) 
TT 49 k-1 ok‘ 
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Figure 3. Graph of ¢(w). 


There is not a closed form for ¢(w) and so we resort to numerical integration. We 
truncated the infinite product at n = 1000 and integrated from 0 to 15 using a 
straightforward Riemann sum with dx = 0.02 and the midpoints of the subintervals 
for the points of evaluation. Values for w were from 0 to 3.8 in multiples of 0.2. 
The integration was done with True BASIC on a portable Macintosh. See Fig- 
ure 3. The distribution is very flat for —1 < w < 1, much flatter than a normal 
distribution. A few of the computed values are given in this table. The value of 
(0) is suspiciously close to 1/4, suggesting perhaps that 7/4 is the value of the 
integral 


ao & X 
i I cos dx. (28) 


One might also conjecture that {5 cos2xIT,_, cos x/kdx = 7/8. 

For additional evidence we turned to simulations of the sums. Using MATLAB 
we ran 5000 sums of ©,°”, t,/k with the values of t, picked randomly as +1 with 
equal probability. Figure 4 shows a histogram of the sums. 


450 


-4 -3 -2 -1 0 1 2 3 4 


Figure 4. Histogram of 5000 random sums. 
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@) b(w) 
.249995 


; 249991 
0.2 249972 
0.4 249809 
0.6 249092 
0.8 .246819 
1.0 241289 
1.2 230494 
1.4 212941 
1.6 188425 
1.8 158271 
2.0 125000 
2.2 091729 
2.4 061576 
2.6 030596 
2.8 019506 
3.0 008711 
3.2 003181 
3.4 000908 
3.6 000192 
3.8 000028 
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Proof requires a person who can give and a person who can 
receive... 


—Augustus De Morgan (1808-1871) 


Budget of Paradoxes. London: 1872, p. 262. 
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How to Add Fast—on Average 


Geza Schay 


In a recent article in this Monthly C. C. McGeoch [1] described an ingenious 
method for parallel addition of two n-bit binary integers in 2log, n + 1 steps, as 
well as related methods for the sum of three or more integers. These constructions 
have suggested the subject of the present paper, a very simple, iterative algorithm 
for the sum of two numbers, which works on average about twice as fast, that is, in 
about log, n steps. Thus it adds two 64-bit numbers in about six short steps. This is 
close to amazing, since these are numbers on the order 10”. Apparently, however, 
the variable length of the method limits its practical usefulness, but it leads to the 
interesting mathematical problem of finding the probability distribution and the 
expected value of the length. 

This length, that is, the number of steps in the new algorithm, is determined by 
the length of the longest “carry sequence” (i.e. the longest string of consecutive 
non-zero carries) in the ordinary addition algorithm as applied to binary numbers. 
In a classic paper Burks, Goldstine and von Neumann [2, pp. 45-46] have 
discussed the latter length and obtained log, n as an upper bound for its expected 
value. However, they did not propose any new algorithm to take advantage of this 
bound; they only suggested that “either the carries must be accelerated, or use 
must be made of the average number of carries or both.” Our algorithm, which 
does this, has been described in the book by Scott [3, pp. 54-55], although without 
any discussion of the number of steps. In the present paper we make the 
connection between the new algorithm and the mathematical analysis of [2], and 
carry it somewhat further by providing an asymptotic formula rather than an upper 
bound for the expected value, and an approximate evaluation of the probability 
distribution as well. 


THE ALGORITHM. We start with an example. We add the binary digits without 
any carries and in a second number we save the carries, including zeroes, in the 
places where the carries would go. We repeat this until all the carry digits become 
zero: 


4 


11031031001 
10010%10#4i1@41 
010 000 0 1 0 
100103100 1 +0 
10311031000 0 
000 000 01 0 0 
1011031 0i4i1 0 0 
000 00 0 0 0 0 0 
In general, let a =a,a,_,...d) and b=b5,b,_,...b,) be two nonnegative 
integers in binary form and define recursively, for i = 0,...,n and k = 1,2,..., 
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a(0) =a;, 60) = 5,, (1) 
ak) =a,(k — 1) + b,(k — 1) mod 2, (2) 
b..,(k) =a;(k — 1)-b,(k — 1) and b,(k) = 0. (3) 


The computation ends when we reach the first k such that b,(k) = 0 for every i, in 
which case the a,(k), for i = 0,...,n + 1, constitute the digits of a + b. 

In the example the algorithm has thus produced the sum a + b = 1011010100 
in three steps. Clearly it always leads to the sum, since the carries are added in at 
the same places as in conventional addition, they are just handled in a different 
order. 


THE NUMBER OF STEPS. In the worst case we need n + 2 steps. For randomly 
chosen addends, however, the worst case almost never happens and Table 1 near 
the end of the paper will indicate, in addition to other results, that the number of 
steps is under log, n + 4 in more than 99% of the cases. The example above is 
rather typical in this respect. 

While a proof for log, n as an approximation to the expected number of steps is 
fairly involved, it is easy to give a heuristic argument for this expectation: For any 
value k in the computation, if the carry digit b,(k) is 1, then b,,,(k + 1) is 1 if and 
only if a,(k) is 1 as well. Since, by the assumed randomness, the latter event occurs 
with probability 1/2, we can expect on average about half as many 1’s among the 
b(k + 1) as among the bk). Thus, if n = 2*, then half this number of 1’s can be 
halved about k = log, n times to make all carry digits zero. 

Let us turn now to the detailed discussion of the probabilities involved. 

First, we assume that the digits of a and b are independent Bernoulli random 
variables, each digit being 0 or 1 with probability 1/2. 

Examining the algorithm we see that in the first step a carry digit 1 is generated 
only when corresponding digits a; and b, are both 1. (In the example above this 
occurs for i = 0, 3,5, 8.) In the next step, such a carry digit is propagated to k = 2 
if and only if it is added to a 1. (In the example this occurs only for i = 1.) Such a» 
1, however, can come only from a combination 0, 1 or 1,0 for a; and D,. If a; and b, 
are both 0 or both 1, then their sum is 0 mod 2, and so the propagation of the carry 
digit 1 is stopped. (In the example this happens with the carry digits b,(1) = 
b.(1) = b,(1) = 1.) The propagation of the carries follows the same rules for all 
values of k. Thus we see that a carry digit 1 is started at k = 0 at combinations of 
the form 1,1. Such carry digits are propagated to higher values of k by adjacent 
combinations at k = 0 of the form 0,1 or 1,0, and are stopped by combinations at 
k = 0 of the form 0,0 or 1, 1. 

For example in the addition of the two numbers below 


1103101 00i21i4241 
10110101041 


the 1,1 at i = 0 would generate a 1-valued carry, which would then be propaged by 
the 1,0 and the 0, 1, and finally stopped by the 0,0 at i = 3, that is, would result in 
Q-valued carries up to i = 3 when k = 4. At the same time the 1-valued carry 
generated by the 1,1 at i = 4 would be immediately stopped by the 0,0 at i = 5. 
The same would happen to the carry from i = 9, while the carry generated at 
i = 6 would be stopped by the 1,1 at i = 9. Thus all carry digits would become 0 
when k = 4. 

The considerations above suggest the following definition: Given two nonnega- 
tive integers a = a,a,_,...a@, and b = b,b,_, ... b, in binary form, we shall call a 
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sequence of consecutive pairs (a,, b;) a run of length r if it starts (from the right) by 
(1,1), is followed by r — 1 pairs of the form (0,1) or (1,0), and ends with (0, 0), 
(1,1), or with (a,, b,) if the latter is the last of the r — 1 pairs of the form (0, 1) 
or (1, 0). 

Thus the two numbers in the first example have a run of length 2 and three runs 
of length 1, whereas the two numbers of the second example have two runs of 
length 3 and two runs of length 1. 

Notice the exceptional case in the definition, when the run ends at the left end. 
For instance the two examples below both show runs of length 2 (and the second 
one also a run of length 1): 


01 101 
1 1, 1 11. 


(One could, of course, modify the numbers in the first case by starting them with 
leading zeroes, but that would create a problem in our recursion below.) 

It follows at once from the foregoing that the number of steps equals one plus 
the length of the longest run. Thus we want to obtain the distribution of this 


length. 
Define the following random variables: For the random pair a = a,a,_,... do 
and b =b,b,_,...b, let L = length of last (that is, left-most) run, M,, = length of 


longest run. We are interested in finding the distribution of M,. The simplest way 
to do this is by writing a recursive formula for the tail probabilities of M, as 
follows. The event {M, > r}, for any r = 0,...,, can be decomposed as the union 
of two mutually exclusive events: {M,_, > r} and the event {M, >r, M,_, <7}. 
The latter event can occur if and only if the last run has length r + 1 and the 
longest run in the remaining places has length r or less. Thus we can write for the 
corresponding probabilities 


P(M, >r) =P(M,_, >r)+P(L=r+1,M,_,-1 <r). (4) 


Writing g,, = P(M,, > r) and making use of the known special form of the last run 
and the independence of the digits, we obtain the difference equation 


i 
5] Odea (5) 


1 
= + — 
nr Qn-1,r 4 


Clearly, we also have q,, = 0 if r > n. 
The above equations can be solved by the method of generating functions and 

partial fractions. (See Feller [4].) Multiplying through by x” and summing over n, 

we get 

De Inr®” = % Yn" + » (5) | » Qn-r-1 > a 

n=r n=r n=r 4\2 4 


7 (6) 


Denoting the generating function by Q(x) and summing the geometric series in 
the second term on the right, we can rewrite this as 

1jx\’ 1 x(x\" : 

Q(x) =10,(x) + (5) = - F[5) 2 (7) 


Hence 


r 


Xx 
(1 —x)(x""! _ Qrt2y 4 2"+*) ° 


O,(x) = (8) 
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The probabilities we want to find are the coefficients in the power series 
expansion of this Q,(x). To compute them we express the right hand side as a sum 
of partial fractions and expand those as geometric series. It is easy to see that the 
second factor in the denominator has a simple zero at some x, near 1. This x, can 
be found approximately by setting x = 1 + z in the equation 


yitl _ 9rt2y 4 9°42 = Q (9) 
and neglecting o(z) terms. This results in the equation 
1+ (r+ 1)z -—2’**z =0 (10) 
and so 
1 1 
2* saa pl] and = 14+ ST: (11) 


Notice that every zero of the denominator in Equation (8) is simple. For if we 
write 


f(x) —x't1 — Drtezy 4 Qt? (12) 
then any multiple zero x of this polynomial would also satisfy 
f'(%) = (r+ 1)x" — 2’*7 =0, (13) 
and so 
qrt2 
x= ; 14 
* r+1 (14) 
Substituting this into f(x) = 0 we would obtain 
Qrt2 
— 277 |% 4+ 27°17? = 0 (15) 
r+1 
with the solution 
r+1 
x= (16) 
r 


Equations (14) and (16) are incompatible for integer values of r and so no multiple 
Zeroes can exist. 
Denoting the zeroes of f(x) by x1, x5,...,%,,, we can thus decompose Q,(x) 
into partial fractions as 
Ay A, A, A, 41 


+ + + + +———— |, (17) 
1-x xX,-xX %XxX,-xXx X44 7X 


Q,(x) =x" 


Expanding each .of the fractions into a geometric series, we get for the 
coefficients 


_— A, A, A,41 18 
Anr — ot xprt + xpath rote xnertt . ( ) 


Here x, is the root with the smallest absolute value, and the terms with the other 
roots can be neglected in comparison when n — r is large. For practical purposes 
this means all interesting cases, since q,, ~ 0 for small n — r already when n = 8, 
as Table 1 below shows. Thus we get 


Anr ~ A, + n—-r+1 ° (19) 
1 
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TABLE 1. 


1.333333 
1.171573 
1.078379 
1.035999 
1.016950 
1.008197 
1.004016 
1.001984 
1.000985 
1.000491 
1.000245 
1.000122 
1.000061 
1.000031 
1.000015 


n 
E(M,) = YS due 2.366 3.351 4.341 5.335 
r=0 


We can easily evaluate the two coefficients here using Equations (8), (11) and (17), 
and obtain 


(20) 


Now E(M,,) = X”_»q,,, and so this expected value can be obtained from the 
above formula. But, unfortunately, x, depends on r in a fairly complicated 
manner, and this precludes evaluation of the sum in closed form. Thus we present 
a numerical evaluation in Table 1 for n = 8, 16, 32 and 64. In the last column an 
approximation to q,, 1s given for n = 64, which we are now going to explain. 

Since, with our earlier notation, x, = 1 + z, we have 


—(n-r+1) 


1 (n-rt+1)z 
Anr = 1 — ——- = | — |] + ——__—— 
(1 +z) n—-rt+1 
1 —(n-rt+1)z 1 n—-rtl 21 
= e = exp qrt2 ( ) 
If we set n = 2* and neglect the linear terms, then we get 
Inv = Le (22) 


Comparison of the last two columns of the table shows the amazing accuracy of 
this approximation. Furthermore, a Monte Carlo computer simulation for n = 16 
produced 4.34 for the average number of iterations, in close agreement with the 
1 + 3.35 expected from the table. 

The above exponential approximation to q,, leads to the log, n estimate for 
E(M,,) = uo 4,, aS follows: When k — r= 5, then q,, ~ 1 and so, increasing k 
by 1 we just add an extra term of 1 to the sum. Thus 


E(M;,) =1+ E(M,,) (23) 
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for any n sufficiently large, and so 
E(M,,) = log,n +c. (24) 


From the table we find that c = —0.65, and so the expected number of iterations, 
being one more than this, is approximately log, n + 0.35. 
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How can we expect teachers to plan curriculum, choose texts, explain 
mathematics to their colleagues and to parents or to give a sound 
picture of it to their students, unless they have some knowledge and 


experience of the history and philosophy of mathematics and its role 
in contemporary culture? - 


——K. O. May 
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NOTES 


Edited by: John Duncan 


Fibonacci-like Sequences and Greatest 
Common Divisors 


H. R. Morton 


It is a curious feature of the Fibonacci sequence {f,} that the greatest common 
divisor (f,,, f,,) of two terms in the sequence is itself the k-th term in the sequence, 
with k = (m,n). This result and its extension to sequences satisfying the recur- 
rence relation 


fat — af, + bf n—1> 


Starting with f, = 0, when a and 5b are any coprime integers, is proved by Lucas 
[L1], [L2]. The traditional proof, which is nicely presented in Hardy and Wright 
[HW 148-9], uses relations between the sequence {f,} and an auxiliary sequence, 
describing both sequences in terms of the roots of the quadratic t* — at — b. The 
purpose of this article is to present a proof which uses only simple congruence 
features of the sequence {f,}. The result is stated below as Theorem A. It is 
deduced readily from Theorem B, which shows that the terms f,, in the sequence 
which are divisible by any fixed d are regularly spaced. 


Theorem A. Let {f,,} be the sequence of integers determined by the initial conditions 
fo = 9, f, = 1 and the recurrence relation 


ta+t — af, + bf, —1> 


where a and b are any two coprime integers. Then (f.n, f,) = t+fon.ny: 


Remark. The choice of f, = 1 is not important; any other choice will just result in 
a multiple of the same sequence. 


Theorem B. Let {f,} be the sequence of integers defined in Theorem A. Let d be a 
positive integer, and let S be the set of integers N for which fy is divisible by d. Then S 
consists of all multiples of some integer k, depending on d and the sequence. 


In what follows we shall use standard congruence notation and algebra; thus 
f, =0modd means that f,, is divisible by d. The only property of gcd which is 
needed is that every common divisor of two numbers also divides their gcd. In the 
case of general coprime coefficients a and b we need the result that if d divides bc 
and is coprime to b then d divides c, or equivalently, in the context of congru- 
ences, that any number coprime to d has an inverse mod d. In the case b = +1 
the proofs use more elementary arguments, involving only addition and multiplica- 
tion mod d. 
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Deduction of Theorem A from Theorem B: Let d be a positive integer and let 
f,, = Omodd and f, = Omod d. 

Consider the set S of integers N for which f, = 0Omod d. By Theorem B this 
Set consists of all multiples of some k. Now m,n © S, by hypothesis. Thus m and 
n are each divisible by k and hence also their gcd, (m, n), is divisible by k. The 
integer (m, n) thus belongs to S, which in turn means that f,, ,, = 0mod d. 

Now choose d=(f,,,f,). Then f,, and f, are both divisible by d. The 
argument above shows that f(,,,,) is also divisible by d = (f,,, f,). 

Conversely, choose d = f(,, ,) and again consider the set S of integers N for 
which f, = Omodd. Then S consists, by Theorem B, of all multiples of some k. 
Clearly (m, n) € S, since f(,,, ») iS divisible by d, and hence (m, n) is a multiple of 
k. Now m and n are multiples of (m,n), and hence are also multiples of k. So 
m,n © § and thus f,, and f, are both divisible by d. It follows at once that their 
gcd, (f,.,f,,), iS divisible by d = fom, ny: 

We have already established that f(,, ,) is divisible by (f,,, f,). Thus fon.n) = 
+(f,,, f,), as claimed. 


It remains to establish Theorem B. This is most simply done in the case 
b = +1, when a can be any integer, by extending the sequence to include the 
terms f,, for negative integers n also. The proof follows from two simple proposi- 
tions; modifications of these needed to prove the general case are then given. 
Finally an alternative proof of Theorem B is indicated, along lines suggested by the 
referee. 


Proposition 1. Let {f,} be a sequence of integers satisfying the recurrence relation 
fia. = Of, + Of,_1, where a and b are integers. Suppose that f,, = O0mod d. Then for 
every k <n the terms f,, , , are related by 


fran + (—b) "fy, = Omod d. 


Proof: By induction on k. It is clearly true for k = 0,1. Now 


Taek + (—b)* "fg =a n+k + ) + (—b)*af,-, + b(-b)"' fy gay 
= 0mod d, 
by the induction hypothesis. = 


In general, Proposition 1 shows that f,,, = +b*f,_, modd with n > k, assum- 
ing that f, = Omod d. 

Suppose now that b = +1. The relation can be read in the opposite direction 
as f, , = —abf, + bf,,,, since b~' = b. Integers f,, satisfying the recurrence 
relation may then be defined for all negative integers n also. Proposition 1 holds 
for all k in this case, showing that f,,, = +f,_, modd for all k, where f, = 
QOmod d. Then f,_, = Omod d if and only if f,,, = Omodd. 

The set S of all integers N (positive and negative) for which f, = Omod d is 
thus invariant under ‘reflection’ in any of its elements n € S, where reflection in n 


interchanges the integers n + k. 
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Theorem B now follows from the geometrically obvious Proposition 2. 


Proposition 2. Any set S of integers which contains 0 and is invariant under reflection 
in each element of S consists of all multiples of some fixed integer k. 


Proof: Either S = {0} or we can take k > 0 as the least distance between any two 
elements of S, which we can write as n and n + k. Symmetry of S under reflection 
in n + k shows that n + 2k © S. By induction on r, symmetry about n + (r — 1)k 
shows that n + rk © S for all positive integers r. Symmetry about n extends this to 
show that n + rk © S for all integers r. Because k is the least distance between 
any two integers in S$ there are no further elements of S. Given that 0 € S we can 
then write 0 = n + rk for some r, so that n is a multiple of k, and hence S consists 
of the multiples of k. = 

In the general case of coprime a and b Proposition 2 holds, when restricted to 
positive integers n only. In this case the reflection invariance for the set S should 
be taken as saying that if n@S and n=>k then n+ke€S if and only if 
n —k ©S. Proposition 1 shows that f,,, = +b*f,_, modd with n >k when 
f, = Omod d. Hence the set of S integers N > 0 with f, = Omod d does have the 
modified reflection invariance, provided that b and d are coprime. Theorem B then 
follows in the case that d is coprime to Db. 

In the remaining cases, when b and d have a common factor, c > 1 Say, the 
recurrence relation gives f,,, = af, modc, and hence f, =a”"~' modc. Now a 
and b are coprime, and hence a and c are coprime, so f, is never divisible by c for 
any n > 0. The terms f,, with n > 0 are then never divisible by d; in these cases 
the set S consists only of 0, and again satisfies Theorem B, taking k = 0. 


Sketch of an alternative proof of Theorem B: Observe that if f, = QOmodd 
then the sequence f,,f,4,,---,f,445--- IS a multiple of the sequence 
fo. fi>--->fz5--. modd. Explicitly, an easy induction on k, using the recurrence 
relation, shows that f,,, =/f,.,/, mod d. After another induction to prove that f, 
and f,,, are coprime, and hence that f,,, is coprime to d, it follows that when 
ne€S then k © S if and only if n + k € S. The set S thus has the property that if 
m,n €&S with m >n then m+n €5S. Theorem B follows readily. 


Remarks. It is interesting to look explicitly at the sequences given by small choices 
of a and b, besides the Fibonacci sequence with a = b = 1, and the integers, with 
a=2,b= -1. ; 

It is shown above that the terms f, with n > 0 are never divisible by any prime 
factor of b. On the other hand Lucas showed that each prime p which is coprime 
to b divides some term f,, in the sequence, with n > 0, and hence divides infinitely 
many terms. 

Values of n for which f,, is divisible by p can be found as follows, although 
these are not always the smallest possible. Set A = a* + 4b and let p be any prime 
not dividing A or b. If A is a square mod p then f,,_, 1s divisible by p, while if A 
is not a square mod p then f,,,, is divisible by p. If p divides A then f,, is divisible 
by p. Explicit details of this and other divisibility properties of Lucas are reported 
in [D] and [HW]. 


ACKNOWLEDGMENTS. This proof was developed in 1993 as a result of conversations with Rob 
Baston, with whom I was sharing the teaching of an elementary course involving congruences and 
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divisibility properties of integers. I am grateful to him and to Kit Nair and Alastair King for provoking 
me to complete this proof as a means of avoiding the more complicated induction proofs. I must thank 
the referee for suggestions which allowed me to extend my original presentation with b = +1 to the 
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Generating Symmetric Groups 


I. M. Isaacs and Thilo Zieschang 


It is well known that the symmetric group S, on the symbols {1, 2,3,..., } can be 
generated by two carefully chosen permutations. It is easy to check, for example, 
that the cycles x = (1,2) and y = (1, 2,3,..., 1) will do the job. We prove in this 
note that except when n = 4, care is needed in the choice of only one of the two 
generators. 


Theorem A. Assume that n # 4 and let x © S, be an arbitrary nonidentity element. 
Then there exists an element y € S,, such that S, = (x,y). 


We mention that when n = 4, the conclusion of Theorem A really fails. If 
x = (1, 2)(3, 4), then x lies in the normal Klein subgroup K of S, of order 4. Since 
the factor group S,/K is noncyclic, there can be no element y € S, such that 
(x, y) is the whole group. 

To prove Theorem A, we need a way to recognize when a subgroup G CS, is 
actually the whole group. A well-known (and nearly trivial) result of this type is 
that if G contains all transpositions (that is 2-cycles) of S,, then G =S,. It is 
almost as easy to see also that if G contains all 3-cycles of S,, then it contains the 
alternating group A,, and so either G = A, or G = S,. In this case, if we can find 
some odd permutation in G, it follows that G is the whole group S,. 

To use the results of the previous paragraph, it may seem necessary to undergo 
the tedium of checking that the subgroup G contains every transposition or every 
3-cycle. There is a marvelous short-cut, however, discovered around 1870 by 
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C. Jordan, that enables one to get away with establishing the existence of just one 
transposition or one 3-cycle in G. Obviously, this could not possibly work for a 
completely arbitrary subgroup G C S,, and there is another hypothesis needed for 
Jordan’s theorem. 

Write © = {1,2,3,...,n}. If x © $, and A CQ, we write Ax to denote the 
image of A under the map x. (The subset Ax CC) is called the translate of A 
under the permutation x.) Now fix a subgroup G C S,. A nonempty subset A c 0 
is said to be a block for G if for each element x € G, the translate Ax is either 
disjoint from or equal to A. Clearly, each singleton subset of © is a block and so 
too is the whole set ©, but these are certainly not very interesting and they are 
referred to as trivial blocks. The situation in which Jordan’s theorem applies is 
where the group G is primitive, which means that the only blocks for G are the 
trivial ones. 

It is instructive to play a little with the definitions of blocks and primitive 
groups. Fix a subgroup G CS, and observe that if the set 0 can be decomposed 
into pairwise disjoint parts that are permuted by the translations via elements of 
G, then each part is a block. (We call such a decomposition of Q a G-invariant 
partition.) Conversely, every block for G must be one of the parts of some 
G-invariant partition. To see this, let A be an arbitrary block and observe that 
the translates of A via elements of G must also be blocks. The distinct translates 
of A are pairwise disjoint, therefore, and if their union is the whole set 0, we have 
a G-invariant partition. Otherwise, we can get a G-invariant partition by creating 
one additional part consisting of all the left-over points. 

We can always decompose () into its orbits under the action of G and we 
observe that this is trivially a G-invariant partition. Orbits are thus blocks and it 
follows that if G is primitive, then either all orbits are singleton sets and G is the 
trivial group, or else the whole set © is an orbit and G is transitive. It is easy to 
see for n > 2 that the trivial group is not primitive and it follows (for n # 2) that 
primitive groups are always transitive. 

To see a natural example of a transitive group that is not primitive, imaging 
marking the faces of a cube with the numbers 1 through 6 and let G C S, be the 
group of permutations induced by rotations of the cube. (Note that G is transitive 
and |G| = 24.) For definiteness, suppose that the cube is numbered as is standard 
for dice, so that on each pair of opposite faces, the numbers total 7. Since every 
rotation of the cube carries a pair of opposite faces to a pair of opposite faces, we 
see that the three sets {1,6}, {2,5} and {3,4} form a G-invariant partition of 
QO, = {1, 2, 3, 4,5, 6}, and hence each of them is a nontrivial block for G, which is 
therefore imprimitive. 

In general, if A is a block of a transitive subgroup G C S,, then the G-translates 
of A cover Q, and hence they form a G-invariant partition in which all parts have 
equal size. It follows in this case that |A| must divide n. Also, if A is nontrivial, 
then so are its translates, and hence if G is transitive but imprimitive, it follows 
that every element of ( lies in a nontrivial block. We state this observation 
formally for future reference. 


Lemma. Suppose G CS, is transitive and let a € Q. Then G is primitive if the only 
blocks containing a are {a} and (. a 


One of the goals of this paper is to provide a direct and elementary proof of 
Jordan’s theorem, which we can now state. 
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Theorem (Jordan). Suppose that G is a primitive subgroup of S,,. 


(a) If G contains a transposition, then G = S,,. 
(b) If G contains a 3-cycle, then either G = S, or G =A,. 


Proof: We prove part (a) first. Build an undirected graph & with vertex set 
QO = {1, 2,3,..., n} by joining distinct vertices a and b if the transposition (a, b) 
happens to lie in the group G. The connected components of ¥ partition the 
vertex set Q and we claim that these components are blocks for G. It suffices to 
show that the components form a G-invariant partition, and so we must prove that 
they are permuted by the elements of G. Since it is clear that the components are 
permuted by graph automorphisms, we want to show that each element of G 
actually is an automorphism of Y. 

If vertices a and b are joined in ¥, we must show for each element g € G that 
vertices (a)g and (b)g are also joined. If a and b are joined, however, then the 
transposition t = (a, b) lies in G and hence t® = g'tg is also an element of g. 
Since ¢® is the transposition ((a)g,(b)g), however, we deduce that (a)g and (b)g 
actually are joined in Y, as required. 

We now know that the connected components of ¥ are blocks for G. By 
assumption G is primitive, however, and this tells us that either each component is 
a singleton and the graph is totally disconnected, or else the whole set © is one 
component and the graph is connected. Since we are given that G contains a 
transposition, we know that ¥ contains an edge and it is not totally disconnected. 
It follows that Y is a connected graph. 

To prove that G is the full symmetric group, it suffices to show that it contains 
an arbitrary transposition (a, b). Seeking a contradiction, we assume that vertices a 
and b are not directly joined in ¥. We know that there is some path leading from 
a to b in the graph and we suppose that a, m and n are three consecutive vertices 
in some shortest path from a to b. (Note the possibility that n = b.) Since 
transpositions (a, m) and (m, n) are in G, it follows that (a, n) = (m, n)(a, m)(m, n) 
is also an element of G, and thus a is joined directly to n in Y. This is a 
contradiction since it follows that we can delete m from a shortest path from a to 
b to obtain a still shorter path. 

The proof of (b) is similar, but a little more complicated. Again we construct an 
undirected graph & with vertex set 0, but this time, we join vertices a and b if G 
contains some 3-cycle moving both a and b. (In other words, a and b are joined 
iff G contains both the 3-cycle (a, b,u) and its inverse (b, a,u) for some point 
u <Q.) Here too, the permutations g €G are graph automorphisms since if 
t = (a, b, u) lies in G, then t? = ((a)g, (b)g, (u)g) also lies in G. As in the proof of 
part (a), the hypotheses on G enable us to deduce that the graph ¥ is connected. 

Continuing to parallel the proof of part (a), we show next that ¥ is a complete 
graph. Exactly as before, it suffices to show that if a, m and n are three distinct 
vertices such that a is joined to m and mi is joined to n, then a and n are directly 
joined. We know that G contains a 3-cycle g that moves a and m and a 3-cycle h 
that moves m and n, and our task is to produce a 3-cycle in G that moves a and n. 
We are done unless (n)g =n and we can assume that g = (m,a,u) so that 
(m)g = a. Now h is a 3-cycle moving m and n and it follows that its conjugate h® 
is a 3-cycle moving (m)g = a and (n)g = n, as desired. 

To show that G is either A, or S,, it suffices to show that G contains an 
arbitrary 3-cycle (a, b,c). Since the graph ¥ is known to be complete, vertices a 
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and b are joined and thus G contains the 3-cycle t = (a, b, u) for some element 
u © Q). Similarly, G contains s = (b, c, v) and we can certainly assume that u # c 
and that v # a. If u =v, then st = (b,c, u)(a, b, u) = (a, b, c) and this lies in G, 
as required. If u # v, on the other hand, we compute that 


t~'s~'tst = (u,b, a)(v, c, b)(a, b, u)(b,c,v)(a,b,u) = (a,b,c) 


and again, (a, b,c) € G. a 

The following result has appeared in various places as a problem, apparently 
with the intention that it should be done by ‘brute force’. In fact, it provides a good 
demonstration of the power of Jordan’s theorem, and that is why we present it 
here. 


Theorem B. In the symmetric group S,, write x = (1,2,3,...,n) and let y = 
(1, 2,3,...,m) for some integer m such that 1 <m <n. Then <x, y) is the whole 
symmetric group unless both n and m are odd, in which case «x, y) is the alternating 
group A,,. 


Proof: We show first that G = (x, y) is primitive. Certainly, G is transitive, and so 
by the lemma, it suffices to show that a block A containing 1 and at least one other 
number a © must be the whole set ©. If a >m, then (a)y =a, and so 
a © AQ Ay. But A is block, and hence Ay = A. Since 1 € A, we see that 2 € A, 
and thus 2 © AM Ax. We conclude that Ax = A and thus A must be the whole 
set Q, as desired. 

If, on the other hand, a < m, then since a > 1, we see that a = (a)x~'y. Thus 
a= Aq Ax'y and we conclude that Ax~'y = A. Thus n = (1)x™'y lies in A 
and we are in the case of the previous paragraph. 

Since we now know that G is primitive, we will be able to apply Jordan’s 
theorem if we can find a 3-cycle in G. Observe that 


(a)xyy=at+2=(a)yx ifl<a<m-—2 and 
(a)xyy=at+1l=(a)yx ifmt+1<a<n-—l, 


and so xy and yx agree on all numbers in () except possibly m — 1, m and n. It 
follows that xyx~'y~' fixes all but these three numbers and we compute that 


l 


(m — l)gx'y! =n, 
(m)xx" ly"! 
(n)xyx"'yo! =m. 
It follows that xyx~!y~! ‘= (m — 1,n, m) and G does contain a 3-cycle. 

By Jordan’s theorem, G is either A, or S, and our remaining task is to 
determine which group we actually have. If n is even, then x is an odd permuta- 
tion and if m is even, then y is an odd permutation, and so in these cases G # A, 
and we conclude G = S,. If m and n are both odd, however, then x and y are 


even permutations, which lie in A,. It follows that G CA, and hence G =A, in 
this case. a 


=m-—1 and 


l 


Proof of Theorem A. The result is clear when n < 4, and so we can assume that 
n > 4 and we consider first the case where n is odd. By renaming the symbols 
being permuted, we can suppose that x moves 1 but that (1)x #2. Let y= 
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(1, 2)3,4,...,), the product of a transposition and a cycle having odd length 
n — 2 and write G = (x, y). Since n — 2 is odd, the element y”~* is a transposi- 
tion in G and it suffices by Jordan’s theorem to show that G is primitive. 

Since x carries 1 to something other than 2, we see that G is transitive. Let 
A <Q be a block containing 1, so that by the lemma, it suffices to show that 
A = {1}. Note that |A] is a proper divisor of n and in particular, it is odd and at 
most n/2. Since y* fixes 1, we see that 1 © AM Ay’, and thus A = Ay’. But 
{3,4,...,n} is an orbit for <y*) (since n — 2 is odd), and thus if any one of the 
numbers a with 3 <a <n lies in A, they all do, and |AJ>n-—1>n/2, a 
contradiction. Also 2 € A since otherwise |A| = 2, and this is a contradiction too. 
Thus A = {1} and we are now done in the case where n is odd. 

Now, assume that n is even. If x is a transposition, we can suppose that 
x = (1,2) and we take y = (1,2,3,...,n) so that (x, y) is the whole symmetric 
group. If x is a 3-cycle, we can suppose that x = (1,2,3) and again we take 
y = (1,2,3,...,n). In this case too, (x,y) is the whole symmetric group by 
Theorem B, since n is even. 

We can now assume that x moves at least four points. By renaming symbols if 
necessary, we can suppose that (3)x = 4. There are at least two numbers other 
than 3 and 4 moved by x and at least one of these, say 1, is not carried to 3 and we 
can assume (1)x = 2. Now let y = (2,3)(4,5,..., 7), the product of a transposition 
and a cycle of odd length n — 3, and let G = (x, y). Since n — 3 is odd, y”~° is a 
transposition in G and by Jordan’s theorem, it suffices to show that G is primitive. 

Since x carries 1 to 2 and 3 to 4, we see that G is transitive on 0]. The lemma 
thus applies, and so as before, if we suppose that A <Q) is a block for G 
containing 1, it suffices to show that A = {1}. Now y fixes 1, and so 1 © AN Ay 
and we conclude that Ay = A. It follows that if any one of the numbers a with 
4 <a <n lies in A, they all do. In this situation, |A| > n — 2 > n/2, where the 
strict inequality holds because n > 4. This is a contradiction since |A| is a proper 
divisor of n, and we conclude that A c {1, 2, 3}. 

Recall that (1)x = 2 and (3)x = 4 and hence {1, 2, 3}x is neither equal to nor 
disjoint from {1, 2, 3}. Thus {1, 2, 3} is not a block for G, and so A must be a proper 
subset of this set. Because Ay = A, however, we see that if either 2 or 3 lies in A, 
they both do, and this is a contradiction. We conclude that A = {1}, as required. @ 


A result similar to Theorem A is known to be valid for the alternating group A,, 
for all values of n. Although it seems likely that a proof of this result along the 
lines of our proof of Theorem A might exist, there are technical difficulties in 
some cases, and we have not actually found such a proof. 

Finally, we remark that Jordan proved much more than the result we credited 
him with here. He showed that if G is a primitive subgroup of S$, and H is a 
nontrivial subgroup of G that fixes m points and is primitive in its action on the 
remaining n — m points, then G is (m + 1)-fold transitive. (This means that given 
two arbitrary ordered (m + 1)-tuples of distinct points of ©, there exists an 
element of G that carries one to the other.) The result of Jordan that we stated 
follows easily from this by taking H to be the subgroup generated by the given 
transposition or 3-cycle. Much more can be obtained, however. For example, 
suppose that instead of a transposition or a 3-cycle, we know that G contains a 
p-cycle for some arbitrary prime number p. It is not too hard to show from 
Jordan’s result that if p <n — 3, then G must be either A, or S,. We refer the 
reader to Wielandt’s book [1] for more information on all of this. 
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On the Arithmetic—-Geometric Mean 
Inequality 


Lutz G. Lucht 


Beckenbach and Bellman [1] contains many beautiful proofs of the well-known 
inequality between the weighted arithmetic and geometric means of n positive real 
numbers. In this note another short proof is given which is based on the common 
log properties: (i) the log curve is concave, (ii) the log function is a homomorphism 
of R, onto R. 

Let ¢ be a positive real number. Then 


t—1>logt (1) 


except for t = 1 when equality in (1) obviously holds. This comes, for example, 
from the mean-value theorem in analysis by considering the logarithmic function 
on the interval with endpoints 1, ¢. 

Suppose that the real numbers &, x,,...,x, and the weights A,,...,A, are 
positive, with A, + +: +A, = 1. From (1), applied to t = x,/é, we obtain after 
multiplication with A, é 

ry 


Xy 
A,x,>A,é+ Elogp— (v=1,...,n). 


g% 
Addition gives 
xt... xpn 
AX, te tAx, SEE log 
Now choose 
E=xP...xe, 
and the arithmetic-geometric mean inequality 
AjX, to +A,x, Sab... nn (2) 


follows. The above remark concerning equality in (1) shows that the inequality (2) 
is strict unless x, = --- =x,,. 
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We notice that instead of characterizing (i) by the property the chords are under 
the curve the proof uses the curve is under the tangent lines. Though by (1) only one 
tangent line seems to be involved, via (ii) all tangent lines are actually used. 
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UNSOLVED PROBLEMS 
Edited by: Richard Guy & Richard Nowakowski 


In this department the MONTHLY presents easily stated unsolved problems dealing 
with notions ordinarily encountered in undergraduate mathematics. Each problem 
should be accompanied by relevant references (if any are known to the author) and by a 


brief description of known partial or related results. Typescripts should be sent to 
Richard Guy, Department of Mathematics & Statistics, The University of Calgary, 
Alberta, Canada T2N 1N4. 


Three Open Problems in Functional 
Equations 


P. K. Sahoo 


In this note we seek solution of three problems connected with the characteriza- 
tions of sum form information measures on open domain. The first problem is the 
following: Find all functions f:(0,1) > (the set of reals) satisfying the func- 
tional equation 


f(xy) + f(x -y)) + f(y -x)) t+f(A-*)A-y)) =9 GO) 
for all x, y € (0,1). This problem was stated as an open problem in [6]. If f is 
assumed to be measurable, then Daroczy and Jarai [5] have shown that f(x) = 
4ax — a, where a is an arbitrary constant. Recently, Maksa [13] has posed the 
following problem at the Thirtieth International Symposium on Functional Equa- 
tions: Find all functions f:[0,1] > ® satisfying the functional equation 


(1—x—y)f(ay) =4f(y — x)) + yf(x(1 — y)) (2) 
for all x, y € [0, 1]. One can easily show that if f is a solution of (2), then f is skew 
symmetric about 3, that is f(x) = —f(1 — x), and f(0) = 0. Further, it is easy to 
note that Maksa’s equation (2) implies equation (1). To see this replace x by 1 — x 
in (2) and add the resulting equation to (2) to obtain 


yl f(y) + f(4(1 -y)) + f(y — x)) +f((1 - x)(1 - y))] = 0 

for all x, y € (0, 1]. Since f(0) = 0, the above equation yields (1) for all x, y € [0, 1]. 
Thus, the general solution of (1) will provide the general solution of (2). Utilizing 
the solution given by Daroczy and Jarai [5] of the equation (1), it is easy to show 
that if f is measurable or almost open, then all solutions of (2) are of the form 
f(x) = 0. 

The second problem is the following: Find all functions f: (0,1) > & satisfying 
the functional equation 


flay) +f((1-x)(1—y)) = f(x —y)) + f(y — x)) (3) 


for all x, y € (0, 1). Daroczy and Jarai [5] have also found the measurable solution 
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of this functional equation. They have shown that the measurable solution of (3) is 
of the form f(x) = ax? — ax + blog x +c, where a, b and c are arbitrary con- 
stants. Equation (3) appears in [11] as a problem posed by Lajko when (3) holds for 
all x and y in R. Eliezer [7] has determined the differentiable solution of Lajko’s 
problem. Eliezer proved that if f is differentiable and satisfies (3) for all x and y 
in , then f(x) = ax” — ax +c, where a and c are arbitrary constants. 

Finally, our last problem is the following: Find all functions f, g, h:(0,1) > r 
satisfying the functional equation 


f(xy) +f(x0 -y)) + f(y -x)) + f(A - x) —y)) =8(x)a(y) 
Vx,yeE(0,1). (4) 


This functional equation also arises in the characterizations of information mea- 
sures (see [6] and [12]) and equation (1) is a special case of it. The measurable 
complex-valued solution of this equation has been obtained by Losonczi in [12]. 
Interested readers are referred to [1,2,3, 4, 8, 9,10, 14,15] for treatments of the 
general subject. 
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PROBLEMS AND SOLUTIONS 


Edited by: 
Richard T. Bumby, Fred Kochman and Douglas B. West 


Proposed problems should be sent to the MONTHLY PROBLEMS address given on the 
inside front cover. Please include solutions and relevant references. Three copies 
of all items needed to evaluate the problem should be sent. 


Solutions of published problems should arrive at the MONTHLY PROBLEMS address 
given on the inside front cover before March 31, 1996. If possible, solutions should 
be typed with double spacing. Two copies suffice. Several solutions may be mailed 
together, but they should be on separate sheets of paper. The problem number and 


the solver’s name and mailing address should appear on each solution. A mailing 
label should be included if an acknowledgment 1s desired. 


The published solution is likely to be based on a solution that 1s complete and 
correct. Additional information, such as references to other appearances of the 
problem or its solution, 1s also welcome. 


An asterisk (*) after the number of a problem, or part of a problem, indicates that 
no solution 1s currently available. 


PROBLEMS 


10473. Proposed by Emre Alkan (student), Bosphorus University, Istanbul, Turkey. 


Prove that there are infinitely many positive integers m such that 


1 (*" 4 ‘)3 
5.2" £+\ 2K 


is an odd integer. 


10474. Proposed by Harry Tamvakis (student), The University of Chicago, Chicago, IL. 


Consider a triangle ABC and a point P in the interior of ABC, and let the lines AP, 
BP,CP meet the lines BC, CA, AB at the points D, E, F respectively. Show that ZE DF 
is a right angle if and only if 

1 1 n 1 1 
|PD| |AD| |BD|— |CD\’ 
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10475. Proposed by Wu Wei Chao, He Nan Normal University, Xin Xiang City, He Nan 
Province, China. 


For 0 <x < y <1lorl <x < y, prove that 


y~ [ x” >y /x > y* / x. 


10476. Proposed by Simeon T. Stefanov, Sofia, Bulgaria. 


Let X be a countable compact Hausdorff space. Prove that every continuous map f: X > 
X has a periodic point. 


10477. Proposed by B. H. Neumann, Australian National University, Canberra, Australia, 
and D. G. Rogers, University of Aberdeen, Aberdeen, Scotland. 


Let S be a subset of an abelian group A with the composition operator + and assume 
that S is closed under negation. If f is a permutation of S, construct the functions /f, Rf 
and Cf from S to A as follows. If f(z) = J, define 


If) =4; Rf(-1) = —J; Cf) =t— J. 


Let e be the identity function on S, and call a permutation p of S complete if q = p —e is 
again a permutation of S. 

(a) Show that, if p is a complete permutation of S, then so are Jp, Rp and Cp, and hence 
that 7, R and C may be viewed as operators on the set of complete permutations of S. 

(b) Show that /, R and C are involutions, and determine the largest group that they can 
generate. 

(c) If p isa complete permutation of S, show that the function Bp defined by Bp = polgq 
is also a complete permutation. Does the involution B defined in this way lie in the group 
generated by /, R and C? 


10478. Proposed by Joan P. Hutchinson, Macalester College, Saint Paul, MN. 


Let P be a simple closed n-gon, not necessarily convex (an “art gallery”), with some 
pairs of vertices joined by nonintersecting interior diagonals (“walls”), and suppose that in 
the interior of each of these diagonals there is an arbitrarily placed, arbitrarily small opening 
(a “doorway”). Determine the size of the smallest set G of points (“guards”) so that for 
every other point g in P there 1s a line segment in P, disjoint from the punctured diagonals, 
that joins g to a point of G. 


10479. Proposed by Jeffrey C. Lagarias, AT&T Bell Laboratories, Murray Hill, NJ, and 
Bjorn Poonen, University of California, Berkeley, CA. 


Let p be an odd prime, and consider the polynomial 


with coefficients modulo p. Show that, as a polynomial over the finite field with p elements, 


F,,(x) has a root at x = 1 of multiplicity exactly 4 =. 
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NOTES 


(10476) A point x € X iscalled a periodic point of amap f: X — X if f?(x) = x for some 
integer p > 1. (10477) Complete permutations are similar to the graceful permutations of 
[1,...,] of E 3455 [1991, 646; 1992, 691]. The involutions / and R are similar to the 
involutions z and p in the solution of that problem. (10478) The Art Gallery Theorem states 
that an n sided art gallery can be guarded by 75 | guards, and this bound is best possible 
(see J. O’Rourke, Art Gallery Theorems and Algorithms, Oxford University Press, 1987). 
However, real art galleries have opaque interior walls. This problem adds that feature. The 
guards may stand in doorways and survey both adjacent rooms; lines of sight may pass 
along walls. (10479) While this problem considers the F,(x) as polynomials over a finite 
field, the coefficients are Legendre symbols, and could be written as +1 forO < j < p. 
Such polynomials with coefficients +1 were introduced by M. Fekete in 1912 in a study 
of zeros of Dirichlet L-functions. Classically, the roots of these polynomials in the real 
numbers have been studied. See G. Pélya “Verschiedene Bemerkungen zur Zahlentheorie” 
(Collected Works, Analysis, pp. 76-85 and 488) for more information. 


SOLUTIONS 


A Special Value of a Quadratic Form 


10258[1992, 873]. Proposed by Hans Liebeck and Anthony Osborne, University of Keele, 
England. 


Let a, b, and c be positive integers which are pairwise relatively prime. Prove that if the 
congruences 
A* =—bc (mod a), B* = —ca (mod b), C? = —ab (mod c) 
are solvable for A, B, and C, then the equation 
ax? + by? + cz” = abc 
has a solution in integers x, y, and z. 


Solution by Robin J. Chapman, University of Exeter, Exeter, U. K.. Define u,v, w by 
u=(C/a) (mod c), v= (B/a) (mod b), and w = (A/b) (mod a). Let 


A= { (x,y,z) € Zs x =uy (mod c), x = vz (mod b), y = wz (mod a)}. 


It is clear that A is a sublattice of Z°. If (x, y, z) € A and z is given, then the congruence 
class of y modulo a is determined, and then so are the congruence classes of x modulo b and 
modulo c, i.e., modulo bc. Hence A has index abc in Z°. Since w*b = —c (mod a), we 
also have ax? +by*+cz? = b(wz)* + (—bw)z* = 0 (mod a). Similarly, ax* +by* +c2z? 
is also divisible by b and c. 

Let 


E= { (x,y,z) © R¥ : ax? + by? +2" < 2abe |. 
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This is an open ellipsoidal ball of volume 


V— An (2abc)?/2 _ 82/2 abc 
—— - 3fabe 3 


Since 1/2/3 > 1, we have V > 8abc. Now Minkowski’s Theorem guarantees a nonzero 
(x, y,z)in ANE. With 0 < ax? + by? +. cz? < 2abe and ax? + by” + cz? divisible by 
abc, we have ax” + by* + cz* = abc, as desired. 


Solved also by I. Kastanas, A. D. Melas (Greece), F. Schmidt, GCHQ Problem Solving Group (U. K.), and the pro- 
posers. One incorrect solution and one incomplete solution were also received. 


Powers of the Symmetric Group 


10267[ 1992, 958]. Proposed by Lenny Jones and Mike Seyfried, Shippensburg University, 
Shippensburg, PA, and Stephen Schroer, Mercersburg Academy, Mercersburg, PA. 


Find all pairs of positive integers (n, k) such that the set of all kth powers of elements of 
the symmetric group S, on 7 things is a proper subgroup of S,. 


Solution by National Security Agency Problems Group, Fort Meade, MD. The only such 
pairs (with k less than the exponent of S,,) are (3,2), (3,4), (4,2), (4,6), (4,10), and the pairs 
(5, 2/) such that / is relatively prime to 60 and 2/ is less than 60. 

Suppose that Gx, the set of all kth powers in S,, is a proper subgroup. Since Gx, is 
invariant under conjugation, it must be a normal subgroup. 

Suppose first that n > 6. In this case, the only proper normal subgroup of S, is the 
alternating group A,, so Gg = A,. The kth power of the transposition (12) must lie in 
G, = An, Sok is even. Thus all elements of Gx are squares. The even permutation with 
cycle representation (1234)(56) is not a square, since its square root would have order 8, but 
the exponent of S6 being 3 x 4 x 5 prohibits elements of order 8. Therefore, there are no 
solutions forn > 6. 

Now suppose that n < 5. Since S; and Sz have no proper subgroups, we require n > 3. 
Without loss of generality, we assume that k is less than the exponent of S,. A direct search 
reveals that (n, k) = (3, 2), (3, 4), (4, 2), (4, 6), (4, 10) are the only solutions with n < 4. 
The exponent of Ss is 60, and the argument of the first paragraph implies that Gx is a proper 
subgroup only if G;, = As. As above, we conclude that k is even. Thus Gx € G2, but also 
G2 C As, so Gg = G2. Since As Is the set of squares of elements of Ss, (5, 2) is a solution. 
If / is relatively prime to the exponent 60 of Ss, then the map sending x to x’ is injective. 
Hence Gz; = G2 = As, and the pairs described above are all solutions. 

It remains to show that Gz, # As if gcd(/, 60) # 1. If/ is even, then Gz, C Ga, but 
(12)(34) € As is not a fourth power; its fourth root would have order 8, but no element of 
Ss has order greater than 6. Similarly, if 3 divides /, than G2; © G6, but (123) € As is not 
a sixth power. Lastly if 5 divides /, then Gz; C Gyo, but (12345) € As is not a tenth power. 


Solved also by R. J. Chapman (U. K.), G. Ehrlich, S. M. Gagola Jr., O. P. Lossers (The Netherlands), F. Schmidt, 
GCHO Problem Solving Group (U. K.), and the proposers. One incorrect solution was received. 


Splitting a Sequence of Ultrafilters 


10273[ 1992, 958]. Proposed by Jesus Ferrer, Universidad de Valencia, Burjasot, Spain. 


Let (U,,) be a sequence of distinct ultrafilters on the set N of non-negative integers. 
(a) Show that there is a sequence of disjoint sets (Ax) such that each Ax is an element of 
some U,. 
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(b) Show that there is M C N such that 
{fneN:MeuU,} and {(neN:M€U,} 


are both infinite. 


Solution by Timothy J. LaBerge, Union College, Schenectady, NY. Let (U,,) be a sequence 
of distinct ultrafilters on N. For a subset A of N, we let A° denote the relative complement 
N\A of A in N. We begin by recursively constructing a subsequence (Un, ) of (U,,) and a 
sequence (A,,) of pairwise disjoint infinite subsets of N satisfying 
1. Ay € U,, and 
2. {ne N: By € U,,} is infinite, 
where By = Nifk =Oand By = (),,—, Ay, fork > 0. 

Suppose that we have constructed Uno: Un,,...Un,_, Satisfying 1 and 2 (if k = 0, set 
n_, = —1). Choose m and m’ greater than ng_ such that By € Un OU,’ (this is possible 
by 2). Since U,, and U,,’ are distinct ultrafilters, there is an infinite A C Nsuchthat A € Uj, 
and A® € U,,. Because By, is the disjoint union of By MN A and By M A‘, any ultrafilter 
Un that contains By as an element must contain exactly one of By MA and By A. 
Therefore, either {n € N: Bk NA € U,} Is infinite or {n Ee N: By N AS € U,} is infinite. 
If {n: BL NA € U,} is infinite, set ny = m’ and Ay = By NAS. Otherwise, let ny = m and 
Ax = BR OA. Clearly 1 and 2 are satisfied. 

The sequence (Ax) is a pairwise disjoint family of infinite subsets of N, and by con- 
struction, Ay € U,,. This proves (a). To prove (b), given such a sequence (Ax), we 


set 
M= LU Aon. 


neN 


Because the U,, are filters, and Az, € Un,,, {n € N: M € Un} is infinite. Similarly, for each 
k EN, M° € Un,,,,. This means that M ¢ Un,,,,,S0{n EN: M ¢ Uy} is also infinite. 

One can interpret this solution topologically. Topologize X = N U {U, : n € N} so that 
the points n € N are isolated, and so that a basis for the neighborhoods of U,, consists of 
sets of the form 


B(A) =AU{Uy,: A € Un}, 
for A € U,. It is easy to see that the topology determined by this assignment of basic open 
sets gives a zero-dimensional Hausdorff (hence regular) topology on X. Note that the given 
basis consists of clopen (closed and open) sets. Actually, X with this topology is a subspace 
of the Stone-Cech compactification BN of N. 

Now, every infinite subset of a Hausdorff space contains an infinite relatively discrete 
subspace, so we can find a subsequence (Un, ‘) of U,, that is relatively discrete. In a regular 
Space, every countable relatively discrete subspace has an expansion to disjoint open sets. 
Thus there is a sequence of basic open sets (B(Ax)) with Ax € Un, that is pairwise disjoint. 
In particular, these Ax are pairwise disjoint, and (a) and (b) follow. 


Editorial comment. Kenneth Schilling notes that (a) cannot be improved to the exis- 
tence of disjoint sets (A,) such that each A, € U,. For a class of counterexamples, he lets 
(By, :n € N\{0} ) be asequence of disjoint nonempty subsets of N, and let (U,, : n € N\{0}) 
be a sequence of distinct ultrafilters on N such that B, € U, for all n. Let V be any nonprin- 
cipal ultrafilter on N and define the ultrafilter Up by putting S € Upif{n e N: Se U,} € V. 
Then (U, :n € N) is still a sequence of distinct ultrafilters (since B, ¢ Up for all n), but 
S € Ug is also in many U, and, hence, has a nonempty intersection with By. 


Solved also by R. J. Chapman (U. K.), J.-C. Evard, R. Holzsager, D. W. Jakel, Z. Lipecki (Poland), O. P. Lossers 
(The Netherlands), R. Martin (student), N. Passell, M. Scheepers, K. Schilling, L. Wertheim (student, Russia), GCHQ 
Problem Solving Group (U. K.), and the proposer. 
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Wuaaratic FOrms With Sspecinied Parameters 


10276[1993, 76]. Proposed by Steven M. Gagola, Jr., Kent State University, Kent, OH. 


If 


define det M = xw — yz and dotM = xz + yw. Determine necessary and sufficient 
conditions on a field /’, assumed to have characteristic different from 2, for the existence 
of quadratic forms gj; € F[x, y,z,w] (i, 7 € (0,1) such that detQ = (detM)? and 


dotQ = (dotM)*, where 
_ { 900 401 
e (“0 qt ) 


In particular, do such forms exist when F = Q? 


Solution by Robin J. Chapman, University of Exeter, Exeter, U. K. Such forms exist if 
and only if either 2 or —2 is a square in F. Hence they do not occur if F = Q. Lete = 1 
or —1. If 2e is a square in F, let 


/2edetM + dotM —edotM 
Q= ldotM I (e/2€ det M — edotM) } ’ 
9) 2 


The verifications are immediate. 
Conversely, assume that there exist quadratic forms q;; with the stated properties. A 
straightforward computation yields, for example, 


(q6o + 961) (470 + 971) = (detQ)* + (dot)? 
= (det M)* + (dotM)* 
= x*(244 wt) 4+ 4x7 yzw(z" — w’) + 12(xyzw)* 
+ 4xy?zw(w* — z7) + y* (wt + 24) 
= f(x, y,Z, w). 


Thus, f is the product of two quartic forms over F. Since the coefficients of x* and x? 
in f have no common factor, any nontrivial factor of f must involve x. Furthermore, 
f (x, 0, z, w) = x*(z+ + w*) must factor into two quartics each involving x, so that z+ + w4 
cannot be irreducible over F. Consequently, t* + 1 is reducible over F. 

If w is a root of t4 + 1 in F, thena* =i (where, as usual, iz = —1); it follows that 
(a + a~!)? = 2, and so we are done. The only other possibility is that t* + 1 splits into 
two quadratic factors in Ff’. These two factors must be among 


fe si,P 420+ 1,27 4/21 — 1, 


since the roots of f in an algebraic closure of F are the primitive eighth roots of unity 
(41 +1) / J2. Thus, F contains at least one of {v2, /—2, il. If it contains one of the 
first two, we are done; otherwise, 

f(x, y,Z, W) = (Goo + 1401) (Goo — 1401)(410 + £411)(4910 — £411) 


splits into four quadratic factors, and so x*(z+ + w*) must split into four factors each 
involving x. But this means that ¢+ + 1 has a root in F. As we saw before, this implies that 
J2«€ F, and we are done. 


Solved also by the proposer. 
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On a Conjecture of Sopnie Germain 


10277[1993, 76]. Proposed by L.E. Mattics, University of South Alabama, Mobile, AL. 


Let p be a prime with p = 1 (mod 4). Show that there are integers x and y such that 
x? + y? is of the form u* + pv? for integers u and v, but x + y is not of that form. 


Solution by A.N. ’t Woord, Eindhoven University of Technology, Eindhoven, The Nether- 
lands. We prove the claim without the condition that p be prime. Let p > 1 be an integer 
with p = 1 (mod 4). Put x = (1+ 4p)(1 + 2”) and y = 2x. Now 


xP 4+ yP = (14 2?)x? = (14-2")11 + 4p)? +27)? 
=(14+4p)1+4p)? $1 4+. 27)?7! 
= [(1+ 4pyP-V/2q 4 2Py(pti)i2y2 + p[2d. + 4pyP-DP Gg 4 2Pyprty/272, 


Hence x” + y? has the form u* + pu. Since squares are congruent to 0 or 1 modulo 4, 
and p = 1 (mod 4), no number of the form u2 + pv? can be congruent to 3 modulo 4. 
However, 

x+y=3x=311+4p)(14+2?) =3-1-1=3 (mod 4). 


Hence x + y does not have the form u* + pv?. 


Solved also by the proposer. One incomplete solution was received. 


Collaborating editors: David F. Appleyard, Paul T. Bateman, Duane M. Broline, Barry 
W. Brunson, Frank S. Cater, Gulbank D. Chakerian, Underwood Dudley, Gerald A. Edgar, 
Michael A. Filaseta, Ira M. Gessel, Richard A. Gibbs, Jerrold R. Griggs, Douglas A. Hensley, 
John R. Isbell, Mourad E. H. Ismail, Murray Klamkin, Daniel J. Kleitman, Frederick W. 
Luttmann, Frank B. Miles, Richard Pfiefer, Stephen L. Portnoy, J. O. Shallit, John Henry 
Steelman, Kenneth B. Stolarsky, David E. Tepper, Douglas B. Tyler, Daniel Ullman, and 
William E. Watkins. 


One picture is worth a thousand words, provided one uses another 
thousand words to justify the picture. 


~~H, M. Stark 
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My Favorite Elliptic Curve: 
A Tale of Two Types of Triangles 


Richard K. Guy 


One of the many beauties of elliptic curves is their blend of arithmetic and 
geometry, not only intrinsically but also in their applications. If you want to learn 
more about them there are several good introductions available: Silverman & Tate 
[9], Knapp [7] and Cassels [2], who manages to write a whole book on elliptic 
curves without using the word ‘rank.’ 

The curve of the title (88A in [1] or [4]) is: 


Y*=X°?-4X +4 

Figure 1 shows a picture of part of its part. It’s fairly uncomplicated curve: it 
has only one real component and doesn’t break up into an ‘egg’ and an infinite 
branch as many elliptic curves do. Moreover, it doesn’t have any torsion points, 
points of finite order, except for the point at infinity, which we must always 
remember. And I thank the referee for reminding me that when I say ‘torsion 
points’ this is an ellipsis for ‘rational torsion points.’ For example, the points of 
inflexion are of order three, but they are not rational on this curve. One of the 
difficulties for the beginner is keeping track of what field he is working in: it is 
often convenient to vary the focus from complex to real to rational, and even to 
consider finite fields. 


Figure 1. The elliptic curve Y? = X3 — 4X + 4. 
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The curve does have several obvious rational points. 
(0, +2), (1, +1), (2, +2), (-2,+2). 


The points of an elliptic curve form a group. Take the point at infinity as the 
(additive) identity, 0. The group law is described by noting that a straight line 
meets a cubic curve in three points whose sum we define to be 0. For example, the 
ordinate X = 2 meets the curve in (2, + 2) and the point at infinity, so if (2, + 2) 
are P and Q, then 


P+Q+0=0 


and Q = —P. The tangent at (2,2) meets the curve again at (0, —2) = R, say, so 
that 


P+P+R=0, 


R = -—2P and (0,2) = 2P. On joining this to P we see that (—2,2) = —3P, 
(—2, —2) = 3P. By joining —P to —3P or drawing the tangent at —2P we 
discover that 4P = (1, —1) and then 5P = (6, —14), 6P = (8,22) and so on. We 
soon convince ourselves that there is an infinity of rational points on the curve. In 
fact a theorem of Mazur (see [6], p. 223, Theorem 7.5, for example) tells us that 
there can’t be more than 16 rational points of finite order. The rank of the curve is 
1; all rational points can be derived from the generator P = (2, 2). 


Warning: to prove that a point is a generator usually requires more sophistication 
than we display here. 


A mixture of cevians. Problem E3434 in the April 1991 MONTHLY asked, or should 
have asked, for integer triangle ABC in which the median from 4, the bisector of 
angle B, and the altitude from C are concurrent. At the time of writing, no 
solution has been published, though I have seen an interesting one due to J. G. 
Mauldon, which makes no explicit use of an elliptic curve. 


B 
Zz A 
Figure 2. Triangle with concurrent median, angle-bisector and altitude. 
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Ceva’s theorem ((3, p.4] for example) tells us that three concurrent lines drawn 
from the vertices of a triangle divide the sides in ratios whose product is 1: 


BX CY AZ a/2 a bcosA 1 


XC YA ZB’ a/2 c acosB 
where the middle ratio comes from the angle-bisector theorem. Multiply 5b cos A 
= c cos B by 2ac and the cosine formula gives 


a(b? + c* — a*) =c(c* +a’ — db’). 
Put 


2b 2C 
Y= , X= 
at+c at+c 


and we get our favorite curve 


Y? =X? -4X +4. 

So we seem to have found an infinity of such triangles, but a complication is 
that not all rational points on the curve give real triangles. The transformation we 
just made inverts to 

(a:b:c)=(2-X:Y:X). 
We can change the signs of all three of a, b and c, so we do this if necessary to 
make a positive. We can change the sign of Y, since the curve is symmetrical, and 
so make b positive. And we can interpret either sign for c: when c is negative, Y 
divides CA externally in the ratio a:c and BY is the external bisector of angle B 
(Figure 3). 


Figure 3. Triangle with external angle-bisector concurring with median and altitude. 


If X > 0 the triangle inequality requires that Y>2X—-— 2, Y>2-—2X and 
2 > Y, ie. that we are inside the region in Figure 1 bounded by the tangents at 
+P and the line Y = 2, ie. on the piece of curve 0< X < 2,0< Y <2. Such 
points give us genuine internal bisector triangles. The point —4P corresponds to 
the equilateral triangle. 

If X < 0 the triangle inequality gives Y> —2, Y> 2 and 2—2X > Y. We are 
on the piece of the curve below the tangent at —P and above the line Y = 2: ie. 
—2<xX<0,Y> 2. These points give triangles whose external angle-bisector 
concurs with the median and altitude. 

If X is outside the interval [—2, 2], the triangle inequality is not satisfied. Table 
1 lists a point, chosen so that (2 — X)Y is positive, from each of the first twenty 
pairs; together with the associated triple (a, b,c) and a description of the resulting 
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TABLE 1. Points on curve and corresponding triangles. 


point (X,Y) (a, b,c) 
P (2, 2) (0, 1, 1) D 
2P (0, 2) (1, 1,0) D 
—3P (—2, 2) (2, 1, —1) D 
—4P (1, 1) (1,1, 1) G(A) 
SP (6, —14) (2,7, —3) N 
—6P (8, —22) (3, 11, —4) N 
7P (10/9, 26/27) (12, 13, 15) G 
8P (—7/4, 19/8) (30, 19, —14) A 
—9P (—6/25, 278/125) (140, 139, —15) A 
—10P (88 /49, 554/343) (35, 277, 308) G 
11P (310, —5458) (308, 5458, —310) N 
—12P (273/117, —3383 /113) (341, 3383, —3003) N 
13P (206 /31°, 52894 /31°) (26598, 26447, 3193) G 
—14P ( — 3344 /392, 87326 /393) (124527, 43663, —65208) N 
—15P (9362/1032, 1175566 /1037) (610584, 587783, 4832143) G 
16P (27105 /76*, —4131247 /76°) (1182028, 4131217, —2059980) N 
—17P = (256882,/1517, — 128313838 /151°) (31903280, 128313838, —38789182) N 
18P (589456 /6957, 324783646 /695°) (130866415, 162391823, 204835960) G 
19P (—2280402/12472, 5023772066 /12473) (3360926870, 2511886033, —1421830647) A 
—20P (—1896655 /19392, 17691806567/1939°) (18257812083, 17691806567, —3677614045) A 


triangle, if any: D means degenerate, G is good, N does not yield a real triangle, 
while A means that the angle-bisector is external. 

The point 11P is a pleasant surprise, though it would be natural to join 5P to 
6P if one were looking for large integer points. Note that there can only be a finite 
number of integer points, i.e., points with integer coordinates. This is Siegel’s 
theorem [see 8, p. 247, Theorem 3.1, for example]. Fortunately for us, any rational 
point will do, because the determination of all integer points requires some 
ingenuity, Tzanakis & Weger [13, 14] have made some progress with this 
problem; Zagier’s paper [15] explains the connexion with the magic number g that 
we’ll meet below. Indeed, since this paper was first drafted, a method using these 
elliptic logarithms has been developed by Stroeker & Tzanakis [11] (and indepen- 
dently by Gebel, Peth6 & Zimmer [5]) and used by Stroeker & de Weger [12] to 
settle the problem of the Ochoa curve [6]. 

As 11P is quite near infinity, 11 serves as an almost period, with 12P near P, 
13P near 2P, etc. so that one can predict that (for some distance), 4P, 7P, 
10P,13P,15P,18P, 21P, 24P, 26P, 29P,32P,... will give good triangles, and that 
8P,9P,19P, 20P, 30P,31P,... will give external bisector ones, although eventu- 
ally there will be a hiccup, when a better approximation to the period takes over. 
About 4/11 of the points give genuine triangles, and about 2/11 give triangles in 
which it is the external bisector which concurs with the median and altitude. If you 
want better approximations to these fractions, or want to know just when the 
hiccup occurs, read on. 

The ‘near periods’ are associated with ‘large’ points, such as 


72P = (4543.72...,306279.98...) 
227P = (6619.74..., —538594.19...) 
299P = (154460.66..., 60705331.35...) 
1722 P = (5373628.48 ..., 12456655569.68... ) 
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These are found from the convergents to the continued fraction of the number g, 
defined as 


1 


x AX 
sal. y= 0.8193959921938194669745653771... 
2 


= [0,1,4,1,1,6,3,1,4,1,4,1, 8,1, 4,1, 8,7,5,14,14,1,1,1,1,1,2,...] 


where 2:10 is the real period of the curve (see later for more detail) and the lower 
terminal of the integral is the X-coordinate of the generator. The convergents are 


0145 9 59 186 245 1166 1411 6810 8221 39694 47915 


whose denominators 5, 6, 11, 72,227,299... are good candidates for a ‘near pe- 
riod.’ The lines joining —P to 11P, —72P,227P, —299P,... are closer and closer 
to the vertical, so that the points —10P,73P, —226P,300P,... are nearer and 
nearer to P = (2,2); the signs have been chosen alternately so that the X-coordi- 
nates, 1.7959...,1.9423...,1.9520...,1.9898... are less than 2: remember that 
the convergents are alternately less or greater than g. Figure 4 shows part of the 
curve magnified to illustrate the near periodicity: note that points closest together 
differ by 72 P. 


Figure 4. Curve magnified to show 11 and 72 as near periods. 


An elliptic curve over the complex field should be thought of as a torus, with the 
real part as a circle, compactified by the point at infinity. There’s a second circle if 
the curve has an ‘egg.” Figure 5 is a diagrammatic representation of the first 25 
pairs of points +kP whose labels are outside the circle and the fractional part of 
kg, kg —|kg|, is written inside the circle. The X-coordinate increases across the 
horizontal diameter on some curious scale, presumably related to the WeierstarB 
p-function. The regions of Figure 5 are labelled with the letters from the last 
column of Table 1. The ordinates x = —2, 0 and 2 give degenerate triangles, D, 
and the ordinate x = 1 corresponds to the equilateral triangle, a. 
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Figure 5. Diagrammatic representation showing near periodicity. 


Of course, we’ve now given away our secret. You will have noticed that as your 
calculate successive points, the numbers of digits in their coordinates increase in 
size rather alarmingly. But the magic number g will tell us just where any point kP 
is: calculate the fractional part of kg and look at Figure 5. For example, 73P gives 
a good triangle (whose sides have about 140 decimal digits!) but 84P does not; 70P 
is good, but 81P requires the external bisector interpretation, as do 74P and 75P. 

Let E denote the set of real solutions (a, b) to the equation Y? = X°- 4X +4 
together with the point at infinity. The real period is defined by the integral. 


o AX 00 dX 
20 [F=f ee 
a Yo va VX3-4X4+4 
where a is the real root of X* — 4X + 4. Then it is true (but not so easy to prove) 
that there is a group isomorphism 


R 00 LX 
¢:E>- — (a,b) > f — 
20Z a Y 
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Thus the magic number g is really g = ¢(P), and this explains exactly why kg 
being near to 2:2Z is equivalent to kP being close to the point at infinity. 
Here are the fractional parts of kg for the best candidates: 


k 72 227 299 1423 1722 8311 
0.9965 0.0029 0.999402 0.000497 0.999899 0.000091 


Problem for experts. Good approximations to a continued fraction come from 
truncating it just before a large partial quotient. Our continued fraction doesn’t 
display any spectacular partial quotients, but those for several curves do. For 
example, curve 37A, y(y + 1) = x(x’ — 1), has, for the magic number associated 
with its generator, (0, 0): 


[0;3,4,1,1,5, 2, 168, 46793, 1, 7,1,51,1,7,1,6,2,1,1,1,10,1,2,10,1, 2,11, 
16,3,1,1,1,1,4,1,1,3,1,1,5,5, 25,1, 34, 10, 2, 18, 10, 585, 1, 2,3, 1,1, 440, 1, 
1,7,2,1,4,6,16,5,2,3,2,5,1,1, 77,1, 2,1,1,1, 13,51,3,1,2,1,4,4,3,1, 10,5, 
1,1,1,2,1,32,8,1,2,1,4,61,...] 

What is going on? Something akin to what is described by Stark in [10]? 


Isosceles Heron triangles. Colleague Bill Sands is always looking for problems for 
Crux Mathematicorum; he asked if there were triangles with integer sides and area 
associated with rectangles having the same perimeter and area. There are indeed 
many such, but none of them right-angled, which is what he originally asked for. 
This last statement can be confirmed via curve 14A4, which has rank 0 and whose 
six torsion points yield only degenerate triangles. A discussion of the general 
problem may appear elsewhere; and see the last section for an introduction. 

But here we find an infinite family of isosceles triangles. Let the equal legs be 
m? +n? and the base be 2(m? — n*) so that the altitude in 2mn: 


the semiperimeter = p + g = 2m? 
and the area = pq = 2mn(m? — n’) 
where p and q are the sides of the associated rectangle. So we require that 
(p -—q) = 4m‘ — 8&mn(m? — n?) 
shall be a perfect square. If we write 
2n P-@q 


X=—, Y= 
m m 


what do we get? 
Y*=X?-4X + 4. 


This time all rational points give rational triangles which are realized geometri- 
cally, provided that when n is outside the interval [0, m] we are willing to consider 
negative lengths and areas. In calculating the perimeters, sometimes the base of 
the triangle or one of the sides p, q of the rectangle must be taken as negative. 

For comparison with the first family of triangles we use the same multiples of P 
as before, though now a change in sign of Y merely interchanges the roles of p 
and gq. Write X =x/d’,Y =y/d°> where x, y,d are integers with d > 0,x 1 d, 
y 1 d (that is, x and y are each prime to d). Note that x and y are not necessarily 
prime to one another: in fact x, and y, are both even unless k is a multiple of 4, 
when they are both odd, while d, is odd unless k is a multiple of 8. 
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We have seen that (X,,,,Y,4,,) may be found by joining (X,,Y,) to P= 
(X,, Y,) = (2, 2): 


2(X2-2Y,) 2n, 4( X,Y, — 3X2 + 6X, - 4) 


k+1 — 


(X,- 2)? im (X, - 2)° 


k+1 


2 
Xk+t (x¢ — 2y,4,) Met. (x, — 2d,) 


diay (x, — 2d?) Nee Xk — 2y,d, 


We choose m Ln and m > 0; the g.c.d., (m,,,,%,4,), of the numerator and 
denominator of the last fraction is 2d?_,, 16d7_,, 4dzZ_, or 4dZ_, according as 
k = 0, 1, 2, or 3 mod 4. 

Table 2 lists information about the first 20 isosceles triangles and is parallel to 
Table 1. We do not list (m, n) since these are (2d’, x) or (d?, x/2) according as 4 
divides k or not. If m and n are both odd, as they are when k is odd, we keep the 
triangle primitive by dividing all lengths by 2. The rectangle sides p and q are 
2d(2d> + y) or have 4 or % of those values according as 4|k, 2\|k is odd. As 
they are each divisible by d, primitive rectangles can only be given by integer 
points, so that k = 5 and k = 11 are the only nontrivial examples. 

The labels are the same as before, except that the interpretation of A is now: 
altitude and area are negative and the rectangle p X q has gq < 0 < p, while N 
now means that the base of the triangle is negative, the altitude is positive or 
negative according as n > m or n < —m, the area and p each have sign opposite 
to that of the altitude, and g > 0. The latter case is exemplified by 14P where the 
area is positive, but in calculating the perimeter of the triangle, its base must be 
taken as negative. 


Shapes of triangle. In each problem, as the point moves on the curve, the shape of 
the triangle changes continuously. As the rational points are dense on the curve, 
we can approximate to any shape of triangle that is consistent with the geometrical 
properties that have been imposed. 

The cevians triangle, for example, can be as near right-angled at C as we wish. 
Choose a point with X-coordinate as near to V5 — 1 = 1.236... as required. The 
point 18P gives a triangle with A = 39.68°, B = 52.40°,C = 87.92°. The other 
angles approach 90° simultaneously, though not quite at the same speed, as the 
triangle degenerates when we approach X =0; the 70P triangle has A = 
89.95°, B = 88.32°,C = 1.73°. In this problem the triangle can be equilateral, 
corresponding to the point 4P, and points close by to the left or right give triangles 
with one or two angles less than 60°: 68P: <A, B,C) = (62,40°, 60.98°, 56.60°) 
76P: (A, B,C) = (57.519, 59.02°, 63.47°). 

The Heron triangles are isosceles, so don’t display such variety. They vary from 
degeneracy one way to the other: this incarnation of 70P gives base angles of 3.35°, 
while 73P corresponds to base angles of 88.325°. The vertical angle can also be as 
near to 90° as we wish: the points 15P, 31P, 41P and 57P give 84.7°, 93.8°, 86.3* 
and 87.9°. 

The Heron triangles cannot be equilateral, but we can approximate by taking 
points near to the maxima and minima of the curve, X¥ = +2/ V3 = +1.1547.... 
Already 4P: (5,5,6) and 7P: (53, 53,56) are quite good. Next better is 30P with 
base angles 60.525°. 
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TABLE 2. Isosceles triangles (m? + n?, m* + n2,2(m? — n?)) and 
rectangles p X q with common perimeter and area. 


k x y d altitude equal legs base 
1 2 2 1 1 1 0 
2 0 2 1 0 1 2 
—3 —2 2 1 —1 1 0 
—4 1 1 1 4 5 6 
5 6 —14 1 3 5 —8 
—6 8 —22 1 8 17 — 30 
7 10 26 3 45 53 56 
8 —7 19 2 —112 113 30 
—9 6 278 5 — 75 317 616 
—10 88 554 7 4312 4337 930 
11 310 — 5458 1 155 12013 — 24024 
—12 273 — 3383 11 132132 133093 — 31930 
13 206 52894 31 98983 467065 912912 
-14 —3344 87326 39 — 5086224 5109025 — 964286 
—15 9362 1175566 103 49660729 67231321 90639120 
16 27105 — 4131247 76 626233920 868129729 — 1202464642 
—-17 256882 — 128313838 151 2928583241 8508488041 — 15977204880 
18 589456 324783646. 695 284721984400 320177744609 292897113232 
19 —2280402 5023772066 1247 —1773022816809 1859055655241 1117994669680 
—20 -1896655 17691806567 1939  —28523574533020 60139308180389  105889415604678 
k area rectangle ( p, q) 
1 0 (1, 0) D 
2 0 (2, 0) D 
—3 0 (1, 0) D 
—4 12 (6, 2) G 
5 —12 (—3, 4) N 
—6 — 120 (—10, 12) N 
7 1260 (60, 21) G 
8 — 1680 (140, —12) A 
-—9 — 23100 (660, — 35) A 
—10 2005080 (462, 4340) G 
11 — 1861860 (— 1364, 1365) N 
—12 — 2109487380 (— 15862, 13299) N 
13 45181384348 (871689, 51832) G 
—14 2452287298032 (4016298, 610584) N 
—15 2250602387559240 (86546265, 26004616) G 
16 — 3765 12073310528320 (— 494500840, 761398248) N 
-17 — 23395287224795708040 (— 4583904584, 5103790185) N 
18 41697123659341318400400 (346175467610, 120450833640) G 
19 — 991115029206740553325560 (2775 187436616, —357134446535) A 
— 20 — 1510172319128981992388733780 (125150833858190, —12066817875462) A 


A third manifestation. With help from Andrew Bremner we are investigating the 
general problem of finding triangle-rectangle pairs with common perimeter and 
common area. 

Brahmagupta taught us that all Heron triangles are of shape 


c(a* +b’), b(a?+c?), (b+c)(a’ — be), 


which, if we take the third side as base, has altitude 2abc, are A = abc(a + b) 
(a? — bc) and semiperimeter s = a7(b + c). 
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If the associated rectangle is p X q, then we have A = pq, s = p + q, and 
(p—q) =a'(b+c)° — 4abc(b + c)(a? — be) 
must be a perfect square. Set Y¥ = (p — q)/a’(b +c), #= be/a’, FZ =a/(b +c) 
and the equation becomes 
Y*=1- 427 4+ 4272. 
However, in order that this transformation be birational, we also require that 


2 
| ~ 4ax? = (7) =y? 
b+c 


be a perfect square. On eliminating Z, 
(Y? -1) = 167°2*(2- 1) = 42%(H#@- 191-7”) 


we have a quintic surface [which deserves study in its own right]. It contains a 
dozen straight lines, two of which, 2#= 1, % = +1, are double, so that a plane 
through either of them, say 


n(¥—-1) =m(z#- 1) 
cuts the surface in a cubic curve. 
So we can find “all” triangle-rectangle pairs in the following sense. Such a pair 
corresponds to a rational point on the quintic surface. This determines (m, n), the 


‘slope’ of the plane through the point and the line 2= 1, Y = 1. Elimination of ¥ 
between the surface and the plane, yields, on writing x = —m‘, y = 2m‘n’27: 


y? = x| x? + 2(m* — 2m?n + 2n*)x + m°(m — 2n)’I, 


an elliptic curve whose rational points give all triangle-rectangle pairs of ‘slope’ 
(m,n). We are studying the range 0 < |m| < n < 50. 

The discriminant of the curve is 4m'’n‘(m — 2n)*(m‘* — 2m?n + n‘*) and the 
curve is singular just if m = 0, n = 0, m =n or m= 2n. The torsion group is 
Z/4Z, the points (—m3(m — 2n),+ 2m?n?(m — 2n)) being of order 4. However, 
if m+ — 2m?n + n‘ =r? is a perfect square, then the torsion group is Z/4Z x 
Z/2Z, there being additional points (m°(m — 2n),+ 2m*(m — nXm — 2n)) of 
order 4 and ((r — n?)’,0) and ((r + n*)’, 0) of order 2. 

When is m* — 2m?n + n‘ a perfect square? Put 


Y+2 


n 
m? m 2X 


and what do you get? 
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face never bone atiything | “useful”. No discovery of mine has - 
iade, or is likely to make, directly or indirectly, for good or ill, the _ 


Teast difference to the amenity of the world ... Judged by all 
ractical standards, the value of my mathematical life is nil; and — 


itside mathematics it is trivial anyhow. I have just one chance of 


. escaping a verdict of complete triviality, that I may be judged to have a 
| created something worth creating. And that I have created some- 
ee Is oe the sue is about its value. 


—Godfrey H. Hardy (1877-1947) 


A Mathematician in’s Apology, p. 150. Cambridge: Cambridge — 
oe as Press, on 
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Stalking the Wild Ellipse 


Keith M. Kendig 


It’s not very well known, but the area of any ellipse Ax? + Bxy + Cy’ = 1 is 


271 


VB? — 4AC_ 


The following examples reveal two different sides to this formula: 

1. x? +xy + y’? = 1 is an Ellipse tilted from the horizontal; from A = B=C=1 
we easily find its area, 27/ v3. 

2. x? + 4xy + y* = 1 is a hyperbola. Its area is unbounded, yet A = C= 1, B = 4 
produces a definite result, 7i/ ¥3. 

What’s going on? Is this last answer nonsense, or is the formula sending us 
signals, perhaps trying to tell us something informative? It turns out that if 
B? — 4AC # 0, then Ax? + Bry + Cy’ = 1 always defines an ellipse—somewhere 
—and the formula tells us its area. The key here is “somewhere”: far from being a 
simple beast, the ellipse has a decided preference for privacy. In fact if, in a 
specific sense, we “choose A, B,C at random”, the probability is over 81% that the 
ellipse is hidden from normal view. When he’s away, then with probability 77% 
we’ll get a hyperbola to look at. But our formula follows the ellipse wherever he 
goes. In this article, we hitch a ride, penetrating this private side of the ellipse, and 
report to you some unexpected findings. 


Building a Cage. Most calculus students know the ellipse in “standard form”, 
gentle and well-behaved: 


While generations of students easily measure and photograph it, using semi-axes a 
and b. Its area is wab(A = 1/a*, B=0,C = 1/b’). 

Somewhat later, perhaps in a linear algebra course, the ellipse is fed a little 
mixed term Bxy, and at once it begins to stir, becoming more camera-shy. Students 
are supplied with correspondingly higher-tech gear, like eigenvectors. These can be 
used to build a cage around him; thus cornered, the ellipse relents and poses. But 
feed him more than a certain critical level of Bxy (in fact, do anything to the 
coefficients to maker B* — 4AC positive) and the ellipse goes wild, disappearing 
into parts unknown. We can’t just take off in hot pursuit, for we need some idea of 
where to look. Recalling some particulars about making that cage will help us, and 
will also show how we came up with the area formula. 
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The Basic Eigenrecipe for Ellipses. 


1. Find the roots A,, A, of the quadratic equation 


B 
A-A 3 
det B = 0. (1) 
5 C-A 
2. For each 4,, find a real unit vector vu, satisfying 
B 
A-— i; my ) 0 
B ne 


If A, = Ay, choose v, L v,. (If A, # A,, orthogonality is automatic.) 
3. Draw the rectangle having vertices 


1 1 
+ VA + Van 


it surrounds the ellipse, which has parametric equations 
cos(t) sin(t) 
v= Vv, + —=)). 
vA vA2 


The ellipse’s semi-axes are 1/ Yay , and 1/ Yar ; its area is therefore 7/ VA,A). 
the little-known area formula follows from this, because A,A,, being the product of 
the two roots, is just the constant term of the quadratic—that is, it’s what we get 
by putting A = 0 in (1). So 


A, A, = det 


Ni] mo Am 
Q nile 


which is AC — B*/4; this is almost the discriminant B? — 4AC. Rewriting slightly 
then gives us our formula. 


Stalking the Ellipse. How does the eigenrecipe help to locate a hidden ellipse? 
Though we stated it for visible ellipses, let’s try applying the recipe directly to our 
example x? + 4xy + y* = 1. The results are: 


1 1 1 -l 
a laa} a aE 
Rectangle vertices: +(1/V6 ,1V6) + (-i/V2,i/v2). 
Parametric eq’ns: x = cos(t)/ v6 — isin(t)/ ¥2;y = cos(t)/ V6 + isin(t)/ v2. 


If following the recipe is actually valid, then these results suggest using space 
based on C, not R. In fact, our parametric equations do in fact define an ellipse in 
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C2, and one can check that the equations indeed satisfy x7 + 4xy + y? = 1. As it 
turns out, the eigenrecipe has pointed to the truth: no ellipse really takes R? 
seriously as its world, but rather considers C? as its native, rightful living space. 

If this is so, what does everything look like in C*? Unfortunately, most of us are 
adept at seeing in at most three dimensions. Those rare individuals who can 
visualize well in four dimensions could look at x? + 4xy + y* = 1 in C’, taking x 
and y to be complex rather than only real. Such a person would see a very rich 
world; for instance x* + 4xy + y* = 1 defines a (real, 2 — d) surface there, and 
the ellipse would be seen, comfortably sitting in that surface. What’s the surface 
look like? How is the ellipse contained in it? Where does the original hyperbola fit 
in? 

Though we may not have 4 — d eyes, we do have brains capable of concocting, 
on occasion, admirably clever schemes. We are going to try using some strategy, 
together with our 3 — d skills, to help answer those questions and to see the good 
sense to the imaginary answer. Our chances of succeeding are improved if we use 
something in standard form, so let’s try this: 


Understand x? — y? = 1. 


Here A = 1, B = 0,C = —1, so apparently the area of some ellipse is zi. 
Let us write our complex variables x and y as x = x, + ix,,y =y, + ly,. Then 
x? — y* = 1 becomes 


_ \2 . \2 
(x) + 1X2)" — (1 + 2) = 1. 


Note that our original picture corresponds to x, = 0 and y, = 0. Now expanding 
and equating real and imaginary parts gives 


2 2 2 2 
Xi —XZ-—yYpt+yz = 1, 


X1X_ — Yi yz = 0. 


(2) 


Our surface in R* is the common solution set of these two equations. If we take 
3 — d slices of this, we’ll usually see a space curve. For us, the slice x, = 0 will be a 
fortunate choice. 

To see the part of the locus within this 3 — d slice, put x, = 0 in (2). We get 


xi-yityZ=1, yyy, = 0. 


The second equation simplifies things: y,y, = 0 implies that either y, = 0 or 
y, = 0, so every point in the locus is either in the (x,, iy,)-plane (corresponding to 
y, = 0) or in the (x,, y,)-plane (when y, = 0). (For brevity, we’ll call these planes 
R,,, and R,,, respectively.) If y, = 0, we get xf — yj = 1 which of course is 
what we originally had. If y, = 0, then we have x? + y? = 1. Here’sa sketch of the 
part of the locus within this 3 — d slice: 

The part in R,, is our original hyperbola. The part in R, ,, is our coveted 
glimpse, and it’s a circle there. Now the unit of measure in the x,-axis is 1, and in 
the iy,-axis it is 7; thus the unit of area in R yy the (x,, iy,)-plane, is 1-i =i. In 
this plane, our circle is an ellipse having semi-axes 1 and i, so it has area 
mab = wi. This is just what our area formula gives. In C?, the eigenspaces 
corresponding to A, = +1 and A, = —1are C, and C,, respectively. In Figure 1 
we can view all of C,, so when the eigenrecipe says to draw the rectangle with 
vertices (+1,+ i), we can really do it: the square lies in the (x,, iy,)-plane, neatly 
containing the circle there. 
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Ly, 


Figure 1. 


A picture, basically the same as Figure 1, is obtained for x? + 4xy + y* = 1 in 
much the same way: a 45° rotation of coordinates in R, , puts the equation into 
standard form; the slice x, + y, = 0 then does for this example what x, = 0 just 
did for x? — y? = 1. 


What’s the Surface? Topologically, it’s trying to be a sphere. For any Ax? + Bry + 
Cy? = 1 (B? — 4AC # 0), there is some linear combination f of x,, x5, y,, yz SO 
that the slice f = 0 is the union of an ellipse and a hyperbola, looking essentially 
like Figure 1. Topologically, the ellipse can be thought of as the “equator”, and the 
two branches of the hyperbola as two lines of longitude crossing diametrically 
opposite points on that equator. The two branches approach the north and south 
poles. We may add those poles as two “points at infinity”, completing the surface 
to a sphere. The two circles (equatorial and longitudinal) divide the sphere into 
four quarters. Within each 3 — d slice given by f = a positive constant, the curves 
all have two separate branches. In the sphere, the two “ends” of any such branch 
meet at either the north or the south pole. Figure 2 shows how these branches fill 
in two diametrically opposite quarters; curves corresponding to f =a negative 
constant similarly fill in the other two quarters. Only at f = 0 do we get a curve in 
a 3 — d slice with a finite loop. 


Great Escapes. Our first two examples x7 +xy +y? =1 and x*°+ 4x +y*=1 
show different sides of the area formula, but increasing xy to 4xy leads from one 
to the other, the first ellipse disappearing from normal view in the process. What’s 
the journey like? The kind of transition x? + Bry + y* = 1 makes as B goes from 
1 to 4 is also shown in this especially simple example: 


Track x? + Cy* = 1asC varies from +1 to —1. 


The process takes place in the slice x, = 0—in particular, inR,, and R,,.. 
As C decreases to 0, the circle in R, , stretches vertically becoming, by the time C 


reaches 0, the two vertical lines x7 = 1. But a little miracle always occurs precisely 
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Figure 2. 


at the stage when B? — 4AC = 0: the pictures in the first and second planes are 
congruent (and, in addition, they’re both @ or degenerate). This acts as a kind of 
bridge connecting the pictures as B* — 4AC changes sign. In our case, within the 
slice x, = 0 the locus of x? = 1 consists of two parallel lines xj = 1 in R,,, 
together with two parallel lines x? = 1 in Ry, As C decreases from 0, these last 
two lines close in to make an ellipse, playing in reverse what just happened in 

"So far we’ve trained our eye solely on the ellipse—that’s what the formula does 
—but one can profitably take broader perspectives. For instance, the whole 
surface varies as C varies, and one can follow this along, at least topologically. 
There’s also a middle-ground perspective that can be enlightening: If x = x, + ix, 
and y = y, + ly, are coordinates in eigenspaces C, and C,, the four canonical 
slices Ry, have a story to tell of their own. For one thing, whenever B? — 4AC ¥ O, 
looking at what’s in each of the four slices R,, always reveals one ellipse, two 
hyperbolas, and @. Another fact: as B? — 4AC changes sign, the Ryy, always pair 
up and exchange views. In our last example for instance, ellipses and hyperbolas 
exchange in R,, and R,,; in the pair R,,, and R,,, the other hyperbolas 
exchange with @. 

These facts deserve another example; let’s consider x” + y* = k as k goes from 
+1 to —1 (so in this case A and C grow without bound). When k > 0, we get 
circles in R,,, and ¢ in R,,,,. When k = 0, the circle has become a point, which 
we also see in R,,,,. (This of course dovetails with “area = 0” in our formula.) For 
k < 0, views exchange: the circles are in R,,,, and we see nothing in R, ,. InR, ,, 
and R,,,, hyperbolas exchange. When k = 0, one sees two crossing lines in each 
of R,,, and R,,. 

Finally, one can follow the topological changes in the surface. In the last 
example, for instance, the surface for x? + y? =k is a sphere if k #0, and the 


equator squeezes to a point as k becomes 0. A sphere with equator pinched to a 
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point is topologically two spheres touching at one point. Algebraically, x? + y* — k 
breaks up into linear factors precisely when k = 0; the algebraic and topological 
pictures thus reflect each other. Also, the longitudes (hyperbola), which form a 
topological loop when k = 0, pinch to a figure “8” when k = 0—those are the two 
lines x, + iy, = 0 and x, — iy, = 0; each line closes to a loop at infinity, and the 
two loops meet at the origin. 

The interested reader may wish to contemplate a few other questions: 


bos 


In general, when should we take the negative radical in our area formula? 

2. Generalize the eigenrecipe so it works for all four canonical views. (One can 
use 1/ y + A; in place of 1/ VA; , getting four congruent rectangles. ) 

3. As an application of the above question, find the parametric equations 
(involving hyperbolic functions) for “the other hyperbola” defined by x* + 4xy 
+ y* = 1. What simple change of variable transforms these to the parametric 
equations we met earlier for the ellipse? (Each set of equations describes the 
entire surface as t varies over C.) 

4. Trace the topological history of the surface defined by x* + Cy? =1 as C 
varies from +1 to —1. 

5. For fixed A, and C,, A,x? + Bry + Cyy* = 1 describes a family of loci. Our 
formula gives the same result for B as for —B, raising the possibility that the 
corresponding loci might themselves be the “same” in some sense. Is there a 
simple linear isometry, depending only on A, and Cy, which for each B maps 
the loci A,x? + Bry + C,y’ = 1 into each other? 

6. Let’s define “choosing A, B,C at random” to mean picking an arbitrary point 

in a coordinate box in (A, B,C)-space, centered at the origin. A point 

corresponds to an ellipse exactly when A> 0, C > 0, and B* — 4AC < 0. 

Show that the proportion of this box corresponding to an ellipse in the 

(x, y)-plane is 

31 — 3ln4 
144 


Notice that since B* — 4AC is homogeneous, B* — 4AC = 0 is a cone, and 
therefore this number is independent of the size of the box. 

For further reading on viewing conics and other elementary curves in their natural 
habitat of C*, and the surprises one is likely to meet, see [1] (Chapters I and 
IT) or [2]. 


= 1864. 
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The Role of Transitivity in Devaney’s 
Definition of Chaos 


Annalisa Crannell 


1. INTRODUCTION. Devaney’s definition of chaos for discrete dynamical sys- 
tems is one of the most popular and most widely known. It says a function f: 
M > M is chaotic if 


(1) f is transitive—that is, for any pair of non-empty open sets U and V in M, 
there is some k > O with f*(U) NV # ©; 


(2) the periodic points of f are dense in M; and 


(3) f displays the famous condition, sensitive dependence on initial conditions: there 
is a number 6 > 0 depending only on M and f, so that in every non-empty 
open subset of M one can find a pair of points whose eventual iterates under f 
are separated by a distance of at least 6. 


Here M is generally a subset of R", and f” means f composed with itself n times 
—so that, for example, f*(x) = f(f(f(x))). 

One of the ironies of this definition is that, the more popularly understood each 
hypothesis is, the more redundant it is in relationship to the other two. 

For example, sensitive dependence is a condition which is easily understood by 
mathematicians and non-mathematicians alike. It has been even dubbed “the 
butterfly effect” in examples of popular literature such as Jurassic Park [3], and The 
Mathematical Tourist [7], the phrase probably dates back to the Ray Bradbury story 
“A Sound of Thunder”, in which a time-traveller changes the course of history by 
stepping on a prehistoric butterfly [2]. This condition embodies the essence of 
chaos—the utter unpredictability of what ought to be simple systems—and so 
there is something popularly pleasing about requiring sensitive dependence on 
initial conditions. 

However, an elegant paper by Banks, Brooks, Cairns, Davis, and Stacey [1] 
demonstrated that sensitive dependence is assured whenever the function displays 
transitivity and dense periodic points. That is, despite its popular appeal, sensitive 
dependence is mathematically redundant—so that in fact, chaos is a property 
relying only on the topological, and not on the metric, properties of a space. 

The requirement that periodic points be dense is slightly less intuitive than 
requiring sensitive dependence, but it appeals to those who look for patterns 
within a seemingly random system. Mathematicians in particular instinctively seek 
symmetry, and the wealth of periodicities within a chaotic system is a wonderful 


I am grateful to Bob Gethner, who got me interested in the questions that are asked in this paper, 
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mathematical phenomenon. It even allows us to explain, somewhat mystically, that 
“there is order within chaos’. Accordingly, the search for periodic points in an 
understandable one. 

On the other hand, Vellekoop and Berglund [8] recently gave a simple proof of 
an already-known theorem which says that, on any finite or infinite interval in the 
Real line, dense periodic points (and hence chaos) follows directly from the 
condition of transitivity. Moreover, they gave examples which demonstrated that 
neither dense periodic points nor sensitive dependence is enough to ensure any of 
the other conditions leading to chaos. Therefore, in one dimension both sensitive 
dependence and dense periodic points are redundant hypotheses in the definition 
of chaos. 

This leaves us only the study of the transitivity hypothesis, which is required 
both for historical reasons and for the strength of the condition. Still, it has less 
intuitive justification—it is harder to explain in nonmathematical terms, and even 
once it is explained, it seems to follow (morally, although not mathematically) from 
the sensitivity hypothesis, as both of these hypotheses say that, starting with just 
about any data, one could eventually get just about any answer. The purpose of 
this paper is to ask, ‘why transitivity? —-why not something else?” and to provide 
some conditions which might play the same role as transitivity, but which are 
slightly more intuitive. 


2. A POSSIBLE ALTERNATIVE TO TRANSITIVITY. Perhaps, instead of transi- 
tivity, a more philosophically satisfying hypothesis might be one of the following: 


Definition. A function f: M — M is weakly blending if, for any pair of non-empty 
open sets U and V in M, there is some k > 0 so that f*(U) N f*(V) # ©. We say 
f is strongly blending if, for any pair of non-empty open sets U and V in M, there is 
some k > 0 so that f*(U) n f*(V) contains a non-empty, open subset. 

These conditions initially struck the author as an intuitive counterpart to 
sensitive dependence: sensitive dependence on initial conditions thrusts nearby 
points apart (for the same iterate of f), and blending pulls far away points 
together (again, for the same iterate of f)! 


Blending has certain obvious disadvantages when compared with transitivity. 
First and foremost, any function which is blending can not be a homeomorphism, 
which automatically excludes the study of many interesting multi-dimensional 
chaotic systems—such as the horeshoe map [4, pp. 180-189]. Moreover, even in 
low dimensions, functions which are blending are not necessarily transitive, and 
transitive functions are not necessarily blending. Consider the following two 
examples: 


Example 1. f: S' > S', given by f(6) = 6+k, where k/7 is irrational. This 
function is rigid, irrational rotation; it is transitive but not strongly or weakly 
blending. 


Example 2. Any continuous piecewise linear function f: [—1, 1] — [—1, 1] satisfy- 
ing: 


¢ |f'(x)| > 2 on except at the vertices of f; and 


e each vertex of the graph of the function likes alternately on the line y = 7/2 and 
y = —77/2 (see the figure below). 
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Figure 1. Graph of Example. 2 


This function is clearly not transitive; in fact each set is mapped closer to the origin 
than it had been. At the same time, the large slope of f ensures that if neither J 
nor f(/) contains {0}, then f(/) is longer than J. This ensures that every interval is 
mapped, in a finite number of iterations, to an interval which contains a neighbor- 
hood of the origin—the only fixed point. Therefore, the function is strongly 
blending. 

However, a common characteristic of these two examples is that neither has 
dense periodic points—in fact, the first example has no periodic points at all, and 
the second example has a lone fixed point. If we include dense periodic points, 
then the ideas of transitivity and blending in our everyday one-dimensional 
experience have quite a strong overlap, especially when one is considering chaos. 
This can be seen in theorems 1 and 2, which are the main theorems of the paper. 
They show that if periodic points are dense and there’s a strongly repelling fixed 
point, then strong blending = transitivity = weak blending. 


3 THE MAIN THEOREMS OF THIS PAPER. One-dimensional dynamical sys- 
tems are well-understood nowadays, and so there is a wealth of theory on the 
subject. However, the following theorems will be proved with more simple tools: 
The link between open sets and continuous functions; the incredible strength of 
the compactness condition, and induction arguments. These simple proofs are 
possible because the conditions of transitivity and blending are both topological; 
the proofs in this section contain many of the ideas that one finds in a Point-Set 
Topology or an introductory Real Analysis course. 
The easier of the two theorems to prove is: 


Theorem 1. Let M be a subjet of R", and f: M — M a continuous function with 
dense periodic points. Then if f is strongly blending, f is also transitive. 


Proof of Theorem 1. We assume that f is blending and that periodic points of f 
are dense. Pick two non-empty open sets, U and V. Because of the blending 
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property we know that there is some number k > 0 and some non-empty open set 
NCM so that NcfK(U) Nf*(V). ; ; 

For the sake of convenience of notation, we’ll let V = f-*(N) A V; V is the set 
of points in V which “blend” with those in U. 

By the continuity of f, V is open, so our hypotheses allow us to pick a periodic 
point x € V; let us say that x is of period p > k (it may be that p is a multiple of 
the prime period of x). ; 

Because of the way we chose x € V, we know that f*(x) € N, and so there is 
some y € U with f*(y) = f*(x). From this, the simple computation 


f(y) = FP Fy) = FOO FE (4) =f? (x) =x 


ensures that x € fP(U) NV # ©. a 


Remark. The assumption we make that N be open is a crucial one, and the 
theorem does not hold without it. For a counter example, consider the function 


—(2x-—2) for -l<x< -} 


T(x) = /2x for |x| < $ 


2-—2x for; <x<1 


defined on the interval [—1, 1]. This function is an odd extension of the tent map 
—its restriction to the interval [0, 1] is well known to be transitive (see for example 
[5]). Accordingly, T has dense periodic points, and in fact every open interval in 
the domain eventually maps onto an interval which contains the fixed point at the 
origin, so that it is weakly blending. However, this function over the entire interval 
[—1,1] is not transitive: the interval (0, 1) will never map onto any subinterval of 
(—1,0). 

Can one hope that the converse is also true: that chaos inevitably blends all sets 
together (strongly)? The answer is no, unfortunately, as one can see from the 
following. 


Example 3. We can flip the above function and get F(x) = — T(x) on the interval 
[—1,1]. This is a lovely example of a chaotic function with periodic orbits of all 
even periods, but no odd periods. (In fact, if x, is a periodic point of T with 
period n, than x, is a periodic point of F with period 2n-so periodic points are 
dense.) Examining a few iterates of this function will convince the reader that F is, 
moreover, transitive. On the other hand, if U is an interval to the left of the origin, 
and V is an interval to the right of the origin, no matter which iterate we examine 
we will have F“(U) \ F“(V) = © or {0}. Therefore, F is only weakly blending. 
However, a weaker converse is true: 


Theorem 2. Let I be a compact subset of R, and f: I > I a continuous, transitive 
function with a repelling fixed point x,. Then f is weakly blending. 


To prove this theorem, we will use two lemmas: 


Lemma 1. If f and x, are as given above, then x, has infinitely many eventual 
pre-images in I. 


Lemma 2. I[f f and x, are as given above, then the eventual pre-images of x, are dense 
in I, 
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In fact, a much stronger version of these lemmas was proved three decades ago in 
[6]: if f is a piecewise-monotone function, then the set 


{y eI|f*(y) =x for some k} 


is dense in J Wx €E J. As this paper needs only the weaker lemmas (with the 
weaker hypotheses), we will restrict our proofs accordingly. 


Proof of Theorem 2. We wish to show that for any two open sets U,V C IJ, there is 
some n > 0 with f"(U) Nf*(V) # ©. Lemma 2 tells us that the eventual preim- 
ages of x, are dense, and so there exist ue U, v€ V, and j,k > 0 so that 
fi(u) = x, = f*(v). Assume without loss of generality that k > j; then because x, 
is fixed, we have f*(u) =x, = f*(v). Thus, x, ef*(U) Nf*(V) # ©, and our 
theorem is proved. = 


Proof of Lemma 1. We will prove this lemma by induction. 

Suppose x, is our given repelling fixed point, and we are given a finite set 
X, = {Xi pree+yX_1, Xo} with flx,) =x,4,,k = —n,..., —-1. If n =0, then we 
have X, = {xo}. 

Choose an open set U CI with X, Cc U satisfying. 

(1) if ye U then f(y) # x_, (unless n = 0 and y = x,); and 

(2) f(U\ B,) 0 B, = ©. 


—n 


(Here B, is assumed to be the ball of radius e centered at x_,.) In the case n = 0, 
we use the fact that x, is repelling to satisfy the second of these two assumptions. 

From here, we will use transitivity to show that f must send the exterior of the 
set U arbitrarily close to x_,: that is, for every e > 0, f(u.) 1 B, # ©. 

We can choose U sufficiently small that U° contains an open set. By the 
transitivity of f, we know that f<*!(U°) N B, # © for some k > 0; we’re trying to 
show that k = 0. 

Let Y be the set of points which start in the compliment of U and which are 
first mapped into B, on the k + 1” iteration. That is, 


Y={yeuUlfttt(y) eB:  fi(y) €B, if1 <j <k}. 


Then clearly f*(Y) A B, = ©. Moreover, we must have f*(y) € US if y € Y, for 
if it were otherwise, assumption (2) would give us 


fey) =f(F*(y)) = f(U\B,) € (B.)” 


for some y € J. This contradicts the definition of Y. Therefore, we see that 
f*(Y) c US and that f(f*(Y)) nN B, # O@—so f*(Y) is the subset of US which 
proves our claim. 

The rest of the proof of Lemma 1 follows easily, for the claim holds regardless 
of the size of «, and therefore the compactness of U* tells us that there is some 
point y in U‘ with f(y) = x_,. 

This argument gives us an infinite sequence {x_,}¥.) with f«(x_,) =x,, and so 
completes the proof of Lemma 1. = 


Proof of Lemma 2. Let X = {y € I|f*(y) =x, for some k}. We want to show that 
X is dense in J. Because f is transitive, it follows that if X is anywhere dense, then 
X must be everywhere dense. Let’s assume that opposite: that X is totally 
disconnected. 
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If such is the case, the X‘° must be open, so we can write X° = U;,_,/,, where 
the /,’s are distinct, open intervals in J. Lemma 1 notes that X 1s infinite, so, in 
fact, there must be an infinite number of such intervals. 

Note, moreover, that at least one interval has x, as an endpoint; call this 
interval J,. Because x, is fixed, we have f?(J,) A J, # O—in fact, because of the 
construction of the I,’s, we have f*(I,) CJ,. 

On the other hand, transitivity prohibits exactly such a cycle, for J, must visit 
each of the infinite number of intervals—a contradiction. This contradiction arose 
from assuming that X is not dense in J, and so our final lemma is proved. a 
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Harvard Calculus 
at Oklahoma State University 


Kerry Johnson 


Calculus reform is being discussed throughout the mathematics community. Part of 
the reason that the reform movement has so much momentum comes from the fact 
that annually 300,000 students enroll in “engineering” calculus and only 140,000 
finish the year with a grade of D or higher [4]. Part of the reason involves money: 
The National Science Foundation has awarded nearly $11 million dollars to the 
reform movement (3]. 

The calculus reform movement began in the late-1980’s following a wave of 
general-education reforms [7]. Oklahoma State University jumped on the reform 
bandwagon in the Fall of 1992. The OSU Mathematics Department was given an 
NSF grant to try the calculus materials written by the Consortium based at 
Harvard and to disseminate the results of its efforts through conferences. Since 
this was experimental, the department offered two types of courses: one using the 
Harvard materials [1, 2] and one using a standard text [6]. This provided an 
opportunity to compare the two groups of calculus students. 

Calculus at OSU is taught in two 15 week semesters. Calculus 1 covers one 
variable differentiation and integration topics and Calculus 2 deals with series and 
multivariate calculus. 

Here are some natural questions to ask: 


1. Do Harvard students make better grades in calculus than traditional calculus 
students? 

2. Are Harvard students more likely to enroll in subsequent mathematics courses? 

3. Do Harvard students perform better in subsequent mathematics courses than other 
students? 

4. How do students that go from Harvard Calculus 1 into Standard Calculus 2 
perform? 


The data used in answering these questions comes from the Fall 1992 to the Spring 
1994 semester. The data was taken from all students who enrolled in calculus 
during these semesters. 


Question 1: Do Harvard students make better grades in calculus than traditional 
calculus students? 


Yes. A higher percentage of the Harvard students pass the course and make a C 
or better in the course than traditional students. For example, 67 percent of the 
Harvard Calculus 1 students made a C or better in Calculus 1, while only 62 
percent of the traditional students made a C or better in Calculus 1. In Calculus 2, 
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80 percent of the Harvard students made a C or better, where as only 71 percent 
of the traditional students made a C or better. In addition, these results are fairly 
stable because of the number of students involved (more than 110 students in each 
of the fur cases). 


Question 2: Are Harvard students more likely to enroll in subsequent mathematics 
courses? 


That depends on the students’ major. For example, engineering students at 
OSU are required to take mathematics courses through Differential Equations. 
The breakdown of the students’ home colleges (e.g. Engineering, Agriculture, 
Business, etc.) for Harvard students and traditional students turn out to be very 
similar however. 

The largest difference in the enrollments was in Calculus 2. For students who 
made a D or better in Calculus 1, 63 percent of the Harvard students took 
Calculus 2 while only 56 percent of the traditional students took the course. 
However, more of the Harvard students switched to traditional Calculus 2 (44 
percent) than traditional students switched to Harvard Calculus (18 percent). 

Enrollments in Different Equations were similar (Harvard-36 percent, tradi- 
tional-33 percent), while enrollments in Linear Algebra differed by about 7 percent 
(Harvard-20 percent, traditional-27 percent). These percents may change when 
enough time has passed for both groups to take these courses. 


Question 3: Do Harvard students perform better in subsequent mathematics courses 
than other students? 


The simple answer is no. Harvard students do not seem to do as well in 
subsequent mathematics courses as their counterparts from the traditional course. 
Only 45 percent of the 120 students that had a D or better in Harvard Calculus 1 
were able to make the same grade or better in Calculus 2, where as 53 percent of 
the 227 traditional students that made a D or better in Calculus 1, maintained 
their grade in Calculus 2. Of these students, 70 percent of the Harvard students 
made a C or better in Calculus 2 while 82 percent of the traditional students made 
a C or better in Calculus 2. One reason for this is that several of the Harvard 
students took traditional Calculus 2 rather than Harvard Calculus 2. As the table 
under Question 4 indicates, this is not the best possible combination of courses. 

The numbers are too small for higher level courses to reach any solid conclu- 
sions, but the trend seems to be the same as for Calculus 2. In Differential 
Equations, 50 percent of the D or better Harvard Calculus 2 students maintained 
or improved their grades, while 58 percent of the traditional students did. In 
Linear Algebra, 60 percent of the Harvard students made the same grade or better 
that they made in Calculus 2, while 69 percent of the traditional students 
maintained or improved their grades in Differential Equations. In addition, fewer 
Harvard students made a C or better in Differential Equations and Linear Algebra 
than did the traditional students. It is important to note that the comparisons in 
both Differential Equations and Linear Algebra involve less than twenty Harvard 
students and 81 traditional students. 


Question 4: How do students that go from Harvard Calculus 1 into Standard Calculus 
2 perform? 


1995] HARVARD CALCULUS AT OKLAHOMA STATE 795 


TABLE 1 


Total % that made % that made the same 
Calculus Calculus number of =a Cor better grade or better in Calc 2 
1 2 students in Calculus 2 that they made in Calc 1 


Traditional Traditional 

Traditional Harvard 
Harvard Traditional 
Harvard Harvard 


The table above summaries the data. This only includes those students who 
made a D or better in Calculus 1. 

Based on the above information, traditional Calculus 1 students should take 
Harvard Calculus 2, and the Harvard Calculus 1 students should take Harvard 
Calculus 2. The worst possible mixture of the above courses is Harvard Calculus 1 
and traditional Calculus 2. This makes sense since traditional calculus focuses 
more on algebra and memorizing formulas while Harvard calculus focuses more on 
how these formulas can be applied to real life problems and why they are true. 

Of the students who switched from the traditional course into the Harvard 
course, 40 percent made an A in Calculus 1, 36 percent a B, and 20 percent a C. 
Of the students who switched from the Harvard course into the traditional course, 
only 21 percent made an A in Calculus 1, while 38 percent made a B, and 36 
percent made a C. Again this may be due to the algebraic focus of traditional 
calculus. Those students who switched from the Harvard course into the tradi- 
tional course may have been good with algebra and were expecting to get an A in 
Calculus 1. When they discovered that this was not the calculus they learned in 
high school, they switched. 


Conclusions:. The overall trend seems to be that Harvard calculus grades are 
better than traditional students grades, yet traditional students tend to be more 
successful in subsequent math courses. This is most likely due to Harvard students 
moving from an application based course to a more algebraically based course such 
as traditional Calculus 2, Differential Equations, or Linear Algebra. It appears 
that Harvard calculus does address the problem of students passing calculus, but 
may not fully prepare students for algebraically based subsequent courses. 

Another explanation for the higher grade levels in Harvard calculus is the 
novelty effect. Subjects show increased interest, motivation, or participation simply 
because they are doing something different [5]. Most likely, the increased grade 
levels are due to both the novelty effect and the Harvard materials. In any case, the 
grades in Harvard calculus do seem higher. 
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The Stochastic Group 


David G. Poole 


Consider this to be propaganda. We want to present a “new” example of a group 
and lobby for its inclusion in a first course in abstract algebra. Virtually all abstract 
algebra textbooks include the standard examples of groups: the integers under 
addition, the integers modulo 7m under addition modulo n, m Xn matrices over 
the reals (or the integers) under matrix addition, the group of symmetries of a 
regular polygon, the symmetric group, various subgroups of the multiplicative 
group of nonzero complex numbers, and so on. A biased sample of such books! 
reveals that, beyond these basic examples, there is not consensus on which further 
examples to include. In this category we find the general linear group (and 
subgroups thereof) [A],[BB], [E], [G], [Hu], the (one-dimensional) affine group [A], 
[BB], [B], [Du], [H], [Hu], the unit group of the integers modulo n [BB], [D], [G], 
[Hu], the power set of a set under the symmetric difference operator [BC], [Bu], 
various groups of real functions [A], [Bu], [D], [Hu], the Mobius group [B], and a 
host of abstract groups which we will attempt neither to identify nor enumerate. 

Our intent here is not to supplant any of these examples but to add to the list an 
example which is concrete, natural (in our opinion), flexible (in the sense that it 
can be presented at several levels of sophistication and can grow with students as 
their mathematical maturity increases) and which affords an opportunity for 
students to do some genuine mathematical exploration. Our example is based 
upon the notion of the stochastic matrix (or transition matrix) associated with a 
Markov chain. Many students will already have encountered this example in a 
linear algebra course; a slight modification of the usual definition allows us to 
approach it from a group-theoretic point of view. 


Definition. A stochastic matrix over a field F is a square matrix with entries from 
F with the property that the entries in each of its columns add up to 1. 


If F is an ordered field, then we may define a positively stochastic matrix to be a 
stochastic matrix with nonnegative entries—this is what is usually called a stochas- 
tic matrix, in the case where F is the field of real numbers. We will not need this 
concept here. 

Our interest is in the set of all nonsingular stochastic matrices over F under the 
operation of matrix multiplication. When F is a finite field, this produces nice 
examples for students to explore. For instance, over Z, there are only two 
nonsingular stochastic 2 x 2 matrices, 


'The ones on my bookshelf. 


798 THE STOCHASTIC GROUP [November 


and they form a cyclic group of order 2 under multiplication. Over Z,, there are six 
such matrices: 


ootbLt ob le ofl ably ahlo 3) 


Again, it can easily be checked that they form a group under multiplication. After 
a few examples of this nature, most students will have been persuaded that there is 
a theorem lurking in the background and it is a nice exercise to see if they can 
formulate and prove it. While the 2 x 2 case can be done by “brute force” without 
too much difficulty (restrict F to be the rationals or the reals if fields are 
unfamiliar), most students will not want to contemplate even the 3  3—much less 
the general—case. They should therefore be suitably impressed by the elegance 
and generality of the following proof once they have covered a few preliminary 
results on group actions. 


Theorem A. The set S of all nonsingular stochastic n X n matrices over a field F forms 
a group under matrix multiplication. 


Proof: Consider S as a subsect of GL(n, F), the general linear group of all 
nonsingular n X n matrices over F. Write the elements of F”, where F” denotes 
the vector space of all n-tuples over F, as row vectors and let GL(n, F) act on F” 
as matrix multiplication from the right. Let j = (1,1,...,1) © F” and recall that 
for any n X n matrix M over F, jM has the column sums of M as its coordinates. 
It is clear from the definition of stochastic matrix that S is the stabilizer of j and 
hence is a subgroup of GL(n, F) [G, Exercise 5.27]. 


Definition. The group of all nonsingular stochastic n X n matrices over a field F 
is called the stochastic group of n X n matrices over F and is denoted by S(n, F). 
If p is prime and F is the Galois field of order g = p”, we write S(n, q) instead of 
S(n, F). 


The next problem which naturally arises is to determine the structure of the 
stochastic group—can we classify it in terms of known groups? The extent to which 
this can be discussed in an undergraduate course depends upon the level of the 
course and the background of the students. But even in an introductory abstract 
algebra course, much can be done. For example, it is clear that S(2,2) = Z, and 
students should have little difficulty showing that S(2,3) = S,, the symmetric 


group on three elements (take |? 5] and |' (| as generators in the second 


example above). It takes a bit more effort to see that S(2,4) =A, and that 
S(3,2) = S,. Advanced undergraduates might discover that (2,5) has the presen- 
tation (a, bla’ = b* = 1, ba = a’b) from which it follows that S(2,5) = Z, X, Zs, 
the semidirect product of Z; by Z, with 6: Z, > Aut(Z,) the homomorphism 
determined by 0,(a) = a’ (take a = |° ‘| and b = E ‘|, for example). 

But before we lose sight of the forest for the trees, let us step back from these 
examples and ask ourselves if there really is any pattern here. To continue with the 
finite stochastic groups for a moment, we might first ask for the order of S(n, q). 
The proof of the next proposition is analogous to the corresponding derivation of 
the order of GL(n, q) [R, Theorem 8.10]. We begin with an easy lemma. 


Definition. The set of all vectors in F” whose components add up to 1 is denoted 
by U(F”). 
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Lemma. Let F be a field, and let a,, a),..., a, € U(F"). 

(i) If B= <Uk,c,a; for some c; € F, 1<is<k, then BE UF") N 
span{a,, a,..., a} if and only if Xi_,c, = 1. 

ii) If F = GF(q) is the Galois field with q elements, and if {a,, a,,..., a,} is linearly 
independent, then 


|U(F") M span{a,, a,,...,a,})=q@*7!. 


Proof: 
(i) Let B = Lt, c,;a,; € span{a,, a,,..., a,}. Let a; = (aj, aj,..-, @;,) for 1 <i 
<k and let B = (£B,, B3,..., B,). Then 


k k n n ek n 
Lc; = Ya » 7 = » ca; = » B;. 
i=1 im1 \j=1 j=li=l j-1 
Thus Lf.,c; = 1 if and only if D7, B = 1. 

(ii) Since {a,, a,,..., a@,} is linearly independent, there is a one-to-one correspon- 
dence between k-tuples (c,,c,,...,¢,) and vectors B = Yj_,c;a; € 
span{a,, a,..., a,}. By @i), B € U(F”) N span{a,, a2,..., @,} if and only if 
Li.1¢; = 1 so there are g“~' distinct B’s (for we can choose the first k — 1c,’s 
arbitrarily and then set c, = 1 — LiZjc,). 


Proposition. |S(n, q)| = g"~1(q""! — 1Xq""! — gq): (q""! — q"~?). 


Proof: Let A € S(n,q). Since A is nonsingular, the columns of A form an 
ordered basis {a,, a,,..., a,} of F” in U(F"). There are q"~' choices for a, (the 
last component being determined by the criterion that the components add up to 
1). There are then g”"~' — 1 choices for a, since we must rule out any vector in 
the span of a, (which, in this case, is just a, itself). Next, a, is any vector in 
U(F") which is not the span of a, and a,. By the lemma, this means there are 
q"~' — q choices for a. In general, the lemma implies that there are q”~! — q'~* 
choices for a,, i = 2. The result follows. 


The problem of classifying the stochastic groups still remains. The examples 
given above do not suggest an obvious pattern so the following result comes as a 
pleasant surprise. 


Theorem B. For any field F, S(n, F) = Aff(n — 1, F) forall n > 2. 


Recall that the affine group Aff(n — 1, F) consists of all mappings a: F"~' > 
F""! of the form a(x’) = Ax’ +b’, x € F""!, where A € GL(n — 1, F) and 
b<F"~', together with the operation of composition. The monomorphis 
g: Aff(n — 1, F) > GL, F) defined by (a) = |4 "| maps Aff(n — 1, F) 
isomorphically onto the subgroup of GL(n, F) consisting of all nonsingular matri- 
ces whose last row is [0,0,...,0,1] [R]. We will identify Aff(n — 1, F) with this 
subgroup. 


Proof: For n > 2, let Q = i "Is GL(n, F) where I is the (n — 1) X (n — 1) 


identity matrix and j = [1,1,...,1] € F"~!. Define » € Aut(GL(n, F)) to be the 
inner automorphism determined by Q; that is, for X¥ € GL(n, F), p(X) = QXQ™!. 


Observe that Q7! = |! a 


800 THE STOCHASTIC GROUP [November 


We claim that S(n, F) and Aff(n — 1, F) are in fact conjugate subgroups of 
GL(n, F) under o. Let M & S(n, F) and partition M as iy “| where NE& 
M,,-1(F), un, € F""', uy € F andjN + n, =j, ju’ + uy = 1. Then 
N-wj w 

0 1 
and, since ¢ is inner, det p(M) = det M #0. Hence N—- w'j € GL(n — 1, F) 
and so g(M) € Aff(n — 1, F). 

Conversely, if P = E | & Aff(n — 1, F) then, setting M = Q~'PQ, we find 
that 


o(M) = QMQ"' = 


A+b j b? 
j-jA- joj 1 - jo} 
Again, det M = det P # 0 and, checking the column sums, we find that 
j(A + b’j) + (j —jaA — jb’j) = jand jb’ + (1 — jb’) = |, 
Thus M is stochastic and, since g(M) =P, it follows that o maps S(n, F) 
isomorphically onto Aff(n — 1, F) as claimed. 


One of the nicest advantages of this representation of the affine group is that 
subgroups are easy to find, or perhaps we should say easy to name: just add the 
adjective “stochastic” to the name of any subgroup of the general linear group. 
Thus, we obtain the special stochastic group (stochastic matrices with determinant 
1), the upper triangular stochastic group, and so on. We might also consider the 
doubly stochastic group consisting of all nonsingular stochastic matrices whose rows 
also sum to 1. The possibilities are endless; hopefully, enterprising readers will 
decide to continue this investigation. 


ACKNOWLEDGMENT. I would like to thank the referee for his helpful comments concerning this 
paper. In particular, I am indebted to him for suggesting an improved version of the Lemma. 
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A Story of Binomial Coefficients and 
Primes 


J. W. Sander 


1. INTRODUCTION. Primes, the atoms of the integers, naturally have a long 
history in mathematics. Around 1800, Gauss and Legendre made rather precise 
conjectures about the asymptotic behavior of the function a(x) counting the 
primes up to x. The first mathematician who proved any worthwhile results about 
this behavior was Chebyshev in 1851/52 (see [1], paragraph 7). He showed that 


x x 
—— < < C,—— 1 
Crop x 7 (x) 2 Top x (1) 


for some constants 0 < C, < 1 < C,j,. It is well known that in fact 
x 


a(x) = log x + r(x), 


where the error term r(x) is of smaller order than the main term x/log x. This 
result, the so-called prime number theorem, was proved independently by 
Hadamard and de la Vallée-Poussin at the end of the 19th century using complex 
analysis. The bounds in (1), however, can be deduced in a very elementary way. 
Surprisingly (at first), the middle binomial coefficients (2 are an appropriate 
tool. Why is this so? 

Let us write down (2") in its prime factor decomposition for a few small (but 
not too small) values of n: 


(1g) | = 28:38: 7- 13-17-23 31-37-61 67-71-73 +79 83 +89 +97 101 103 
- 107+ 109 - 113 

122 5 2 2 

ce | = 25-32- 7-12? 13-17-23-31-37-61-+...- 113 

[12h ) = 29-397 a13--17-23-31-37-67-...+113 

126)\ _ 46 3 2 

3) = -3-53-412-13-17-19-23-37-41-67-...-113 

128) _ 3 2 

og | = 2:3: UP 13-17 23-37-41- 67+... 113-127 


The dots indicate the presence of all the primes in the given range. 

The first observation is that these integers have many prime factors and at the 
same time are “almost” squarefree, where “‘squarefree” means that an integer has 
no repeated prime factor. A closer look reveals that every prime p, n < p < 2n, 


divides (2"] exactly once. This is clear, since each such p divides the numerator 
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(2n)! of (?" once, but cannot divide m! and therefore is no factor of the 
denominator. One may also notice that the primes p, 2n/3 < p <n = 2n/2, do 
not divide on) (this is because p and 2p, but not 3p, are factors of (2”)!, and p 
divides n!), and this pattern continues. 

The main idea for proving Chebyshev’s result (1) now is to use (2 as an 
approximations for II, <2,p, where p runs through the sequence of primes. The 
size of 2n | in turn can be estimated by Stirling’s well-known formula for factorials. 
Taking logarithms and replacing 2n by x, we obtain 

x=log[ [p= ) log p = w(x) log x. (2) 
psx psx 
The last asymptotic equality in (2) is derived by partial summation, the simple 
sum-analogue of partial integration. 

One may ask for the advantage of (2 over II,<2,p. The answer is that 
binomial coefficients, in addition to their multiplicative properties (which reflect 
their mimicking products of primes), also have a nice additive property, namely 
their recursion formula. This provides the chance to use induction, which actually 
is done in the course of the proof. 

Paul Erdos certainly has been and still is asking the most questions in number 
theory (and other areas of mathematics as well as life in general). He came across 
the beautiful idea to use binomial coefficients for a proof of Chebyshev’s theorem 
in the 1930’s. The importance of this result is but one reason why one likes to know 
as much as possible about the multiplicative structure of binomial coefficients. 
Therefore, it is no surprise that Erdos conjectured that for n > 4, the integers (2"] 


are never squarefree, although they seem to be “almost squarefree”’. He even 
asked the more general question: Given a positive integer a, do we always find 


some prime p such that p'l\ 2m), if n is sufficiently large? 


In 1985, Sarkozy [10] proved that (2" | is never squarefree for all sufficiently 
large n > ng, thus answering Erdos’ first question in the affirmative for large n. In 
1988, Goetgheluck ((6], [7]) gave a numerical verification of this conjecture for all 
n < 24705184 : we remark that if n is not a power of 2, then 4\(2") (as the reader 
may verify by use of Lemma 1(i) below); since 4 is a square, we only have to take 
care of those n type n = 2*. Recently Velammal [11] proved Sérkézy’s theorem 
with an explicit constant n, = 2°. Independently, a similar result was obtained 
by Granville and Ramaré [8]. By checking the finitely many {=} for all n = 2, 
2<k < 8000, this confirms Erdods’ first conjecture. The general question has been 
answered in the affirmative by the author [9] in 1992. These results depend on 
estimates of so-called exponential sums, a deep and useful tool in analytic number 
theory. 

In this note, we shall deal with these problems in a purely elementary manner, 
obtaining, however, weaker results. We define for a positive integer a and a 
prime p 


_ , a 2n 
E,,p(N) = card{n:0 <n < Ny p +( 
and 


E,(N) = card {n: O<n<QN,p’t (2) for all primes P}. 
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E(N) is the number of exceptions n, 0 <n < N, to Erdos’ conjecture for prime 
powers p’. The results of Sdrkdzy [10] and the author [9] show that E,(N) is 
bounded for every fixed a, i.e. the exceptional set of Erdos’ problems is always 
finite. 

Our elementary method leads to 


Theorem. (i) For integers a > 1 and k > 0, 


E,,2(2*) = r (*), 


(SME) PE 


of the first 2" middle binomial coefficients, there are exactly Lt_) (*) elements not 
divisible by 2°. 
(ii) For integers a > 1, k = 0, anda pnme p > 3, 


E, (p*)= [4 - -) (5) 


l.e., in the set 


Note that if a > k, then reid (* ] = 2*, which is the cardinality of the set in (i). 
Therefore, 2° divides none of the coefficients in the set. 


Theorem 2. (i) For any integer a = 1, we have 
E,..(N) < C,(log N)*"", 

where C, is a constant depending only on a. 

(ii) Let a = 1 be an integer and p a prime. For any € > 0 and sufficiently large N, 

E, (N) <N***, 

where 
pt+il 
2 

log p 


log 
Yp ~ 


For any prime p and a > 1, we may choose ¢€ small enough that y, — + € < 1. 
Therefore, (ii) immediately implies the following 


Corollary. Let a = 1 be an integer and p a prime. Then 
E, (N 
lim ——— (N) = 
Nox 
We say that E, , has asymptotic density 0. Since obviously E, ,(N) = E,(N), we 
conclude that the exceptional set in Erdds’ problem has asymptotic density 0 as 
well. 
For references on this subject and related questions see [2], [3], [4] and [5]. 
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2 PROOF OF THE THEOREMS. There is a simple formula for calculating the 

exponent e,(7!) of the prime p in n!, namely 

n—S,(n) 3) 
p-~1 - 


where S,(1) denotes the sum of the digits of n written in base p, ie. S,(1) = Ln,, 
where 


e,(n!) = 


n=n,pitn,_,p’'t+...+n,p tng (4) 


for some integer s and 0 <n, < p, n, > 0. The easy induction proof for (3) uses 
the following argument: Let p’ be the exact power of p dividing n. Then 
e,(n!) = e,((n — 1)!) + j, and the last j p-ary digits of n are 0’s. By subtracting 1, 
these last j digits turn into (p — 1)’s, and the digit before them is decreased by 1. 
The corresponding formula S,(1) = S,(n — 1) ~ j(p — 1) + 1 yields the induction 
Step. 

From this one obtains the charming fact that e,(("7 ")) is exactly the number 


of “carries” occurring while adding m and n in p-ary notation (4). 
For a prime p anda positive integer n, let L,(n) be the number of digits 
n, = p/2 in the above p-ary expansion. The following lemma which has been well 


known to Legrandre and Kummer shows that the order of the prime p in (2") 1S 


n 


closely related to the number of “large” digits in the p-ary representation of n. 


Lemma 1. (i) For a positive integer n, 


ea((°"]] = L,(n) = S,(n). 


n 


(ii) For a positive integer n and a prime p, 


e-((2")] > L,(n). 


Proof: (i) If n is written in base 2, a multiplication of n by 2 simply means a shift 
of digits, thus §,(2n) = S,(n). By (3), we get 


e,((2"}] = €,((2n)!) — 2e,(n!) = S,(n). 


(ii) Let 2n = Ln, p' be the p-ary expansion of 2n. For n,; > p/2, we clearly 
have n', < 2n; — p + 1. Thus by (3) 


on 1 1 
e|( 7 } = pa 250") — §,(2n)) = Do L (2n, ~—n;) 2L,(n). 
The next lemma shows that a slowly increasing sequence (b,,) of integers with 


by, = N for infinitely many WN satisfies b, = N for all N. 


Lemma 2. Let (b,,),,,, be a sequence of positive integers satisfying 


Furthermore, let (m,),, be a strictly increasing sequence of positive integers. Define 
integers r, as the difference of b,,, and m, by 


bn, =m, +r, fork > 1. (6) 
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Then we have form, < N<my,,, 
lby ~— MI Ss max(Ir;,1, |r411). 


Proof: By (5) and (6), 
bn, S by S Oy, + (N — my) = N + ry. 
Also 
Drags = by = Drage ~— (Mya — N) = N + Veet: 
Together, we get 
N+%4,S5 0, SN+h%, 


which proves the lemma. 


Proof of Theorem I. We want to count the number of n, 0 <n <N, such that 2° 
+ (2"] for some fixed a. By Lemma 1 we may count as well the number of n which 
have at most a — 1 digits 1 in their binary representation. If N is a power of 2, this 
is a simple combinatorial exercise. For general prime powers, N = p* is the easiest 
case, but we only obtain upper bounds according to Lemma I(ii). 

(i) By Lemma 1(i) 


E, ,(2*) = card{n:0 <n < 2*, L,(n) <a} 
= card{(19,..., 4-1) e {0, 1}*: yon; < at 


a-l 


=) card{(19,...5 x1) E {0, 1}: Lin, = ‘} 


EC) 


(ii) Since there are (p — 1)/2 p-ary digits > p/2, and these are counted in 
L,(n), we get by Lemma 1(ii) 


E, »(p*) = card {n O<n <p, e,((2"}] < a} 


< card{n: O<n<p*, L,(n) < a} 
a-1 

= Yi card{n:0 <n <p*,L,(n) = t} 
t=0 


_ © (*\(2*) (2) 
t=0 \¢ 2 2 
pt+il Ka-l k 
>) E (0) 
t=0 

Proof of Theorem 2. It was convenient to calculate E,, (N) for N= p*. For 
general N, it is almost impossible to determine the exact value of E, ,(N). For 
this reason, we have to be satisfied with an approximation, obtainable by Lemma 2, 


since E, , is indeed growing slowly and is well-estimated for all N = p*. 
Let 


lA 


by = by (a, p) = card n O<n<N, p*(7")| =N—-E, (N). 
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(i) By Theorem 1(i) for large k, and using the crude upper bound (*] <k’, 


a-1l 
by = 281 = £2028) = (*) ak 
t=0 


Setting m, = 2*, we obtain by Lemma 2 for 2‘ < N < 2**! and a suitable 
constant C, 


E, (N) =|by — Nl <a(k +1)*"' < a(log, N+ 1)"' <C,(log N)*"*. 


(ii) For general p, Theorem 1(ii) yields in a similar fashion for p* < N < p*t? 


P +] k+1 1-1 
E,,(N) < [P| a(k+1) . (7) 
Now we choose WN resp. k large enough that 
pt+il 
log + loga + (a — 1)log(k + 1) < ek log p. 
Adding k log(p + 1)/2 on both sides, we obtain 
| pt+il 
(k + 1)log + loga + (a — 1)log(k + 1) 
pt+il 
log 


log p + €kl 
log p k log p + e€k log p 


= (y, + €)k log p < (y, + €)log N. 


Taking exponentials, we have exactly what is needed in (7) to prove the theorem. 
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Turan’s Graph Theorem 


Martin Aigner 


One of the fundamental results in graph theory is the Theorem of Turan, proved 
in 1941, which initiated extremal graph theory. (See the book [2] by Bollobas as a 
standard reference.) Turan’s theorem was rediscovered many times, and it is the 
purpose of this article to discuss some of the most beautiful older and more recent 
proofs. 

Let us fix some notation. We consider graphs G on the vertex-set V = 
{1,2,...,m} and edge-set E ¢ (7). If ¢ and j are neighbors, then we write ij € E. 
A k-clique in G is a complete subgraph of G with k vertices, denoted by K,. 
Turan posed the following question: Suppose G does not contain a k-clique, how 
many edges can G maximally have? Let us denote this number by t(n, k). We have 
t(n, 2) = 0, and t(n, k) is clearly an increasing function in k. 

We readily obtain examples of such graphs by dividing V into k — 1 pairwise 
disjoint subsets, V=V,U--UV,_,, Vl=n,n =n, +--+ +n,_,, joining two 
vertices if and only if they lie in distinct V;,V,. Let us denote the resulting graph by 


K,,,....n,_,, Figure 1 shows the graph K, , 3. 
Figure 1. 
The graph K,  ,,,., contains 2, ;n,;n; edges, and it is clear that we obtain a 


maximal number of edges among these graphs if we divide the numbers n, as 
evenly as possible, i.e. |n,; — n /| < 1 for all i, j. If, in particular, k — 1 divides n, 
then we may choose n,; = n/k — 1 for all i, obtaining 
( k~ : n? k~2 n?’ 

2 J(k-1)~ k-1 2 


edges. Turan’s theorem now states that this number is an upper bound for the 
edge-number of any graph G on n vertices without k-cliques. 


Theorem of Turan. Let G(V, E) be a graph on n vertices without a k-clique, then 
(k — 2)n? 


|E| < 2(k- 1)” (1) 
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More precisely, the theorem states that the graph K,  _,, , with |n; —n,| <1 
for i #j is the unique graph without a k-clique with the maximal number ¢(n, k) 
of edges. These graphs are therefore called Turan graphs T(n, k). In the following, 
we will restrict ourselves to showing (1), but in some of the proofs we will 
demonstrate that the graphs T(n,k) attain the maximum for arbitrary k. The 
uniqueness is then supplied by an easy argument. 

As a warm-up let us look at the first interesting case k = 3: A triangle-free 
graph contains at most n?/4 edges, and the unique extremal graph is K,, y2,nj2 bn 
is even, respectively K,(,~1)/2,(n+1)/2 if m is odd. For this special case, proofs were 
known before Turans work. Before we look at two of them we need some more 
notation. 

The degree d, of vertex i is the number of edges incident with i. By counting in 
two ways we obtain 


n 
>. d, = 2|El. (2) 

i=1 
A set A C V is called independent, if A contains no edges. As an example, all the 
defining vertex-sets V; in the graph K,, are independent. The number 


| nn > ie? Ces | area Ne-y 


a(G) = max(|U|: U Cc V independent) is called the independence number of G. 


k = 3: First Proof (Mantel 1906). Let ij € E. Since G contains no triangles we 
have (d; — 1) + (d; - 1) <n—2 (see Figure 2), hence d; + d; <n. Summing 
over the edges we obtain 

» (d; + d;) <nlE| (3) 


EE 


The number d, clearly appears d, times in the sum of (3), and we conclude 


> (d; + d;) = )id? <nlEl. (4) 
ijeE i=1 
By the Cauchy-Schwarz inequality (Lx,y,)? < Dx?- Ly? applied to x, =d,, y, = 1 
we obtain by (2) and (4) 


n n 2 
n-|E|\> )d?-n> | Ya, = 4|E|’, (5) 
i=1 i=1 
and thus |E| < n’?/4. oO 

Let us demonstrate how the uniqueness of the extremal graph K,, 2 ,,,. is 
established for n even. (The case n odd is analogous.) If |E| = n?/4, then we must 
have equality in (5S). Now, we have equality in the Cauchy-Schwarz inequality iff 
the vectors are multiples of each other. For the vector (d;) this means d, = d for 
all i, and we conclude n?|E| = n?d* and hence d = n/2 because of |E| = n?/4. 
But this immediately implies G = K,, 2 , 2. 
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k = 3: Second proof (Folklore). Let A be a largest independent set, |A] = a. Since 
G is triangle-free, we have d; < a for all i. The set B = V\ A meets every edge of 
G, whence we obtain |E| < L,;-,d, by counting in two ways. Setting |B| = B = 
n — a we obtain by the inequality of the arithmetic-geometric mean 


a+ 8) n? 


2 


Now we turn to the proofs of the general case (1). 


JE| < Ld,<aps| 


iEB 


First proof (Turan 1941). We use induction on n. (1) is trivially true for small zn. 
Let G be a graph on V = {1,..., n} without k-cliques with a maximal number of 
edges. G certainly contains (k — 1)-cliques, since otherwise we could add edges. 
Let A be a (k — 1)-clique, B = V\ A, |B| =n — k + 1 (Figure 3). 


LB | = n—-k+1 


Figure 3. 


A contains (*; ‘| edges, and we now estimate the edge-number e, in B and the 
edge-number e, , between A and B. By induction, we have eg, < (k — 2/ 
2(k ~ 1)X(n — k + 1)’. Since G has no k-clique, every j € B is adjacent to at most 
k — 2 vertices in A, and we obtain e, p < (k — 2(n —k + 1). Altogether, this 
yields 


el<(*> 1) + — ‘= (n~k +1) + (k—2)(n—k +1) (6) 
~ \ 2 2(k — Sy 
which is precisely (kK — 2/2(k — 1))n?. Oo 


Second proof (Erdos 1970). This proof makes use of the structure of the Turan 
graphs. Let m © V with d,, = max,.,;.,,d;. We denote by S the neighbors of 
m,|S|=d,, and set T= V\S. As G contains no k-clique, and m is adjacent to 
all of S, we note that S contains no (k — 1)-clique. We now construct the following 
graph H on V (see Figure 4). H corresponds to G on S and contains all edges 
between S and T, but no edges within T. 

In other words, T is an independent set in H, and we conclude that H has 
again no k-cliques. Let d’, be the degree of j in H. If j © S, then we certainly have 
d’. = d; by the construction of H, and for j € T, we see di = |S| = d,, = d, by the 
choice of m. We infer |E(H)| > |E|, and conclude that among all graphs with a 
maximal number of edges, there must be one of the form of H. Applying induction 
on S, we thus infer that among the graphs with a maximal number of edges there is 
a graph K,, , which implies |E| < L;, ;n,n; and therefore (1). Oo 


Jreres Nee 1? 
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Figure 4. 


We note that this proof yields the full statement |E| < |E(H)|, H = Turan graph. 


Third proof (Moon-Moser 1962). This proof generalizes the idea of the first proof 
for k = 3 and yields a quantitative estimate for the number of h-cliques. Let G be 
any graph on V = {1,...,m} and denote by @, the set of h-cliques in G with 
l¥,| = C,. As examples we have C, = n,C, = |E|,C, = number of triangles. For 
A © @, we denote by d(A) the number of (A + 1)-cliques containing A. Counting 
in two ways we obtain 

L d(A) = (h + 1)C,,,(4 2 1), (7) 

A&G, 

in generalization of (2). For A € @,(h = 2) let us denote by A,..., A™ the 
(h — 1)-cliques contained in A. 


Claim. For any graph G 


Chat ] ») C, 
C, 2a (Pat (h = 2). (8) 


Consider A € &,, B = V\ A, |B| =n — h. Among the vertices ; € B there are 
precisely d(A) vertices which are adjacent to all of A. Every other vertex in B is 
adjacent to at most one (h — 1)-clique A“, thereby forming an h-clique (Figure 5). 
We thus obtain (note — 1 because of A“ CA) 


> (d(A®) —1—d(A)) +d(A) sn -h, 


i=] 


(1) 


d(A ) -1- d(A) a(A) 


Figure 5. 
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hence 
h . 
v d( A®) ~ (h—1)d(A) <r. 
i=1 
Summation over A € @” yields 


Lu ¥ d( A) ~(h-1) ) d(A) <n,,. (9) 


Aéé, i=1 AéEe,, 


As in (4) we conclude 


h 
y La(a”)= LY day’, (10) 
A&€é@, i=1 Bez, _, 
and by (7) we have 
(h-1) Y d(A) = (W?- 1I)Cy,,- (11) 
AE&eé, 


Substituting (10) and (11) into (9) gives us 


YS d(B)’ <n, + (A? ~ 1)Cy a4. (12) 
Beé,_, 


By the Cauchy-Schwarz inequality applied to the vectors (d(B)), (1) of length 
C,,-,, we finally obtain 


2 1 ; Cir 
nC, + (PNG EL a(Bye——l ¥ aca) - 
Be@#,_, h-1 \Be#,_, 


which is precisely (8). 
In order to prove (1) we must find a relationship between (8) and the edge-num- 
ber |E|. Let us set 


1 2 
\E| = 1 - 5)F@ eR). (13) 


Since the right-hand side of (13) is increasing in #, we must thus prove 3 < k — 1 
for graphs without k-cliques. 


Claim. We have 
Chet o ~ h 
—— > — 
C,, G h+il1 


For n = 1 we have C, = |E|,C, =n, and (14) is satisfied with equality by the 
definition of &. Using (8) and induction on h we infer 


(h> 1). (14) 


Choi 1 ,o-h+1n 1 (80-h)(h-—1)n 
> —_—_— Ks a = a E 
C, ~h—-1 os h | hl 3 
0—h n 
a ee ee 
as Claimed. 
Now, if G contains no k-clique, then C, = 0, and we infer 3 < k — 1 from (14) 
forh +1=k. oO 
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EXAMPLE. Consider (8) for h = 2. In this case, the inequality states that any 
graph satisfies 


——-n 


=(= | |E| 
C; 2 


= —(4)E] ~ n2). 
3 3 AIEI— 2°) 


We conclude that a graph G on an even number n of vertices with |E| = n?/4 + 1 
not only contains one triangle (as it must by Turadn’s Theorem), but more than 
n/3. If we add one edge to K,, ,.,,/2, then we obtain n/2 triangles, and it can be 
easily shown that this holds for any graph with n?/4 + 1 edges. 

So far, the proofs have employed counting techniques, the following three 
proofs use entirely different ideas. 


Fourth proof (Motzkin-Straus 1965). Let G be an arbitrary graph on V = {1,..., n}. 
By w = w(G) we denote the number of vertices in a largest clique of G, w(G) is 
called the clique-number. Now, we associate to each i € V a variable x, (over R) 
and consider the function f(x,,...,x,) = 22;;6 ¢X;%;. 


Claim. We have 


n 
1 ~~ = max{2 ¥ xx) >» x; = 1, x; > 0 foralli}. (15) 
w ifEE i=1 
Since f is continuous on a compact set, there exists x with f(x) = max. Among 
all such vectors x, we choose one with a maximal number of x, = 0. Let C = 
{i © V: x; > O}. We show first that C is a clique. Suppose this is false with 1,2 € C 
but 12 € E. For any t © R in the range —x, <t <x, the vector x, = (x, +1, 
X, —t,X3,..., X,) Satisfies the conditions in (15), and furthermore, f(x,) is a linear 
function in f, since the product (x, + tx, — t) does not appear in f(x,) because 
of 12 € E. Since by the choice of x, f(x,) assumes the maximum at t = 0 (ie. in 
the interior) we conclude that f(x,) is, in fact, constant for all t. For t = x,, x = 
(x, + x2,0,x3,...,x,), we therefore obtain f(<) = f(x), contradicting the choice 
of x. 
We can thus assume f(x) = max with C = {i: x, > 0} a clique. Since 


L= (x, +... 4+2,)°=2 Uixay+ Dx? 
ywec tec 
we conclude that f(x) is maximal if and only if £,-¢x/ is minimal. Under the 
assumption L;<¢x; = 1 this is clearly the case for x; = 1//C|, and we obtain 
1 1 
iec IC| - w 
with equality for |C| = w, which is what we wanted to prove. 


Inequality (1) is now an immediate consequence. Setting x, = 1/n, we have 
f(x) = 2|E|/n? and therefore 


2|E| 1 1 k-2 
— < eee eee 
me IOC) ST gaa ETT 
since G contains no k-clique. oO 


Fifth proof (Li-Li 1981, Kleitman-Lovasz 1994). The basis for this proof is again an 
algebraic structure. To every vertex i © V of the graph G we again associate a 
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variable x, and consider the polynomial 
Do(1,-.--5X,) = > (x; —x,). (16) 
i<j, j€E 
The fundamental observation on the polynomial pg is the following obvious fact: 
w(G) <k—1@ the identification x; = --- =x, of any k 
variables in p, yields the zero-polynomial. (17) 


Let P(n,k) be the set of real polynomials in n variables which satisfy the 
right-hand side of (17). P(n, k) is clearly an ideal in R[x,,..., x,,]. Let An, k) be 
the following family of graphs on V = {1,..., 7}: H is in An, k) if and only if the 
vertex-set V can be partitioned into k — 1 disjoint independent subsets (in the 
language of graph theory, this means H is (k — 1)-partite or (k — 1)-colorable). In 
particular, all our graphs K,  _,,, | are in # and therefore all Turan graphs. By 
our remarks on the graphs K,, _, we can therefore state 


(k — 2)n? 
2(k — 1) 


By P(n, k) we denote the ideal in R[x,,..., x,] generated by {p,: H EeAxAn, kh. 
Since we have w(H) < k — 1 for any such graph, we infer P(n, k) C P(n, k). 


|E(H)| < for all H €#(n,k). (18) 


Claim. We have P(n, k) = P(n, k). 
Before proving this claim, let us see how Turan’s theorem follows from it. Let G 
be a graph with w(G) < k — 1. Then p, € P(n, k) = P(n, kh), ice. 


c= Y @.Py with H, € #(n,k),q, © R[x,,..., x, ]. (19) 


i=1 
By (16), pg is a homogeneous polynomial of degree (p,) = (: )- |E(G)|, and 
analogously degree (p,, ) = (" )- |E(H,)|. We thus infer from (19), (; ] — |E(G)| = 


— |E(H;,)| for some i, and therefore (1) from (18). 

“Let f © P(n,k). To prove f € P(n, k) we use induction on n. For n = 2 there 
is nothing to prove. For a subset S§ C {1,..., — 1} we denote by f, the polyno- 
mial which results from f by identifying x, = x, for all i € S. Clearly, f; © P(n, k) 
and hence f, € P(n, k) for S # @ by induction (note P(n — 1, k) ¢ P(n, k)). Now 
consider the polynomial 


g= LS (-1)"fs. (20) 
Sc{l,..., n—1} 

Cancelling terms we see that every identification x, =x, (i=1,...,n—1)in g 

yields the zero-polynomial. We conclude that (x, — x,,)...(x%,_, —x,,) divides g, 
hence 

§ = (x, — Xp oe (XyR4 —X,)h. (21) 

Since f, © P(n, k) for all S, we have g € P(n, k) by (20), whence h becomes by 

(21) the zero-polynomial whenever we identify k of the variables x,,...,x,_, in A. 


Expanding h with respect to x,, we see that every coefficient polynomial p of a 
power x,, lies in P(n — 1,k) and hence in P(n — 1, k) by induction. We conclude 
that the ‘polynomial g is a sum of expressions 


q(x, — Xx, oe (Ky — Xn) PH (22) 
with HE#(n—-1,k),q Ee R[x,,...,x,]. 
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Adding the vertex n to each such H without edges from n to H, we obtain 
(x, —x,)...(%,-1 —%,) Pa = Py With H © An, k). This now implies g € P(n, k) 
by (22), and thus 

LY (-1)"fs € P(n,k), 

S4#D 
as claimed. O 


REMARK. We note that this proof again yields the full implication of (1), that the 
Turan graphs attain the maximal number of edges, and it can be shown that the 
polynomials p,,, H = Turan graph, already generate the ideal P(n, k). 


Sixth proof (Alon-Spencer 1992). Our last and perhaps most elegant proof uses 
ideas from probability theory. Let G be an arbitrary graph on V = {1,..., n}. 


Claim. We have 


w(G) = r 


j-,"—d 


(23) 

We choose with equal probability 1/n! a permutation 77,, 72,..., 7, of V and 
construct the following set C. We put 7, into C if and only if 7; is adjacent to all 
a, (j <i). By definition C is a clique in G. Let X = |C| be the corresponding 
random variable. We have X = L?"_,X,, where X;, is the indicator random variable 
of i, ie. X,= 1 or 0 depending on i © C or 1€ C. Now we note ie C with 
respect to the permutation (7,,..., 7) iff i appears before all n — 1 —d; non- 
neighbors of i, or in other words, if i is the first among / and its non- neighbors. 
We conclude EX, = 1/n — d, for the expectation and hence 


E(ICl) = EX= DEX,= Y— 


i=] i 


by the linearity of expectation. Consequently, there must be a clique C with at 
least E(|C|) vertices, and this is just our claim (23). 

To deduce Turdn’s theorem from (23) we use the Cauchy-Schwarz inequality in 
the form 


=(Lvevet) s Da Da 


with x, = n — d,. Indeed, (23) and (2) imply 


2 n2 
G) = = FF TS. 24 
w( ) nd, n? — 2\E| ( ) 
i=1 
If G has no k-clique, then w(G) < k — 1 and (24) reduces precisely to (1). Oo 


REMARK. Inequality (23) was first proved in Wei [10] by successively removing 
vertices similar to the second proof. 
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The fact is, although DNA testing may be as foolproof as fingerprint- 
ing, it doesn’t cause excitement. It’s difficult to respond to. It’s like 
advanced math, brilliant but boring, astonishing but passionless. It 
made everyone eager to move on to the next phase of the trial, which 
consisted of autopsy pictures... 


From “If the Gloves Fit” by Dominick Dunne, in Vanity Fair/August 1995. 


Submitted by J. Foster 
Weber State University 


Answer to Picture Puzzle 
| (p. 797) 


A. S. Bessicovitch. 
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“NOTES 


Edited by: John Duncan 


More on Kummer’s Test 


Hans Samelson 


These are some remarks related to the interesting Note “Kummer’s Test Gives 
Characterizations for Convergence and Divergence of all Positive Series” by Jengching 
Tong ([4]). 

(All sequences below have positive terms.) In slightly changed notation (see 
below), Kummer’s test for convergence (developed in [2]) says: a series Lia, is 
convergent if there is a (strongly decreasing) sequence {c,}j with c,_, — Cc, = C°4,; 
the positive constant c is not important, and we shall mostly take it equal to 1. For 
a proof note that L(c,_, — c,) is convergent (its partial sums are bounded by co, 
by telescoping), and then, by the inequality, La, also converges (comparison test). 
(This is essentially the proof given by U. Dini [1], p. 66 (Opere) and by O. Stolz [3], 
p. 259. Kummer assumed lim c, = 0; however one can always adjust the c, so that 
this holds, as noted by Dini, [1], p. 49 (Opere).) 

But note now that any convergent series Lb, can be written in the form 
L(c,-; — C,) aS above; take for c, the series remainder L;,,a;. Thus Kummer’s 
convergence criterion can be stated as: a series La, converges if (and trivially only 
if) there is a converging series Lb, that majorizes it term by term; it is exactly the 
basic comparison test (which had been introduced explicitly by Cauchy in 1821)! 
(Kummer’s own proof does not quote the comparison test; he proves his assertion 
by estimating the series sections L/,,,a,; his argument is actually quite similar to 
that of Dini and Stolz.) 

The opening paragraph of Kummer’s paper (written when he was 23, two years 
after his Ph.D., while he was teaching at a Gymnasium in a small town in Silesia) is 
quite striking; he says that since no universally valid criterion for convergence or 
divergence has ever been found he was looking for a method to test an arbitrary 
series, the test being contained in the following theorems. (Indeed he says later on 
that his convergence test and his divergence test [see below] give a decision for 
every series.) I wonder whether he realized how close to the comparison test he 
was. 

His real contribution and insight here was to take the b, in the form c,_, — Cc, 
and to write the c, as products m,a, (actually he wrote his condition as f(k) = 
M,A,/ Ans, — Mex, = C, or rather “f(k) > 0 for k = ©”!, assuming apparently 
that the sequence f(k) must converge); as he himself, Dini, and others showed, 
fairly simple choices of the m, lead to many good tests for specific series or classes 
of series. (Taking m, = 1, one gets Cauchy’s ratio criterion, comparison with the 
geometric series; with m, = k one is led to Raabe’s criterion, etc (see [4]).) 

Kummer also gave a test for divergence (not the one quoted in [4]; that one was 
introduced by Dini in [1]). It seems to me that the proof contains a mistake that 
makes it invalid; Kummer seems to assume that the sequence m,a,/a,,, — My41 
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is monotone decreasing, for which behavior I can’t find any reason. Dini, who 
recapitulates this proof in [1], seems to assume that for a sequence h, that goes to 
0 the quotients h,,/h,, for m > n are bounded. 

And as a last remark: Dini’s test for divergence says that La, is divergent if 
there exists a (weakly) monotone increasing sequence c, = m,a, with 
L1/m, (= La,/c,) diverging. The monotonicity condition can be written as 
A,/Ay4, = 1/m,)/A/m,,,), and by a standard argument that makes La, di- 
verge also. (That is Dini’s proof.) 

Let us here replace the monotonicity of the c, by the weaker condition that the 
c, are bounded below by a positive number, say by 1. Then we have a, = a,/c,, 
with the series of the latter terms diverging by assumption, and thus a slightly 
improved version of Dini’s criterion also turns out to be identical with the 
comparison test (divergence version)! The interesting aspect is again that writing 
the c, as m,a, makes it easy to set up specific tests. 
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The Derivation of the 
Exponential Map of Matrices 


G. M. Tuynman 


The exponential map, which links the Lie algebra to its Lie group, is of course an 
analytic map, and as such it has a derivative. Although the explicit expression for 
this derivative is not so complicated, the way to obtain it seems long and difficult. 
For instance, in [H] affine connections and differential equations are used, in [P] a 
Taylor expansion is used and terms of order 2 (@(t*)) are neglected, in [V] a 
complicated analysis of Taylor series using enveloping algebras is used; in [MT] a 
rather simple argument using differential equations is used, but this argument is 
only valid for matrices. 

We present here a rather elementary way to obtain this derivative. For ease of 
exposition we will do it for matrices, but only cosmetic changes are needed to 
make it a valid computation for any Lie group. 
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Theorem. Let exp = e: M(n,R) > M(n,R) denote the exponential map on n X n 
matrices with real entries. Then: 


d 1 — e7ad(X) 


ay} 


where ad(X) denotes the adjoint representation Y > ad(X XY) = X-Y — Y-X, and 
where the quotient should be interpreted as the formal power series. 


Proof: We introduce the matrix ACX, Y) defined as: 


d 
— p-X ,__ X+tY 
A(X,Y) =e 7 oe 
The map A is obviously continuous in X and Y, and, moreover, it is linear in Y by 
definition of the derivative. Applying the Leibnitz rule to the equality e**'* = 


exp((1/n)X + t(/n)Y)", valid for any n © Z, gives us: 


d very ly mt“k Ud (2x ‘y] (2x) 
— = — -{— —-X+t-Y]]- —X]}. 
dt |;=0 ° Lew(- | dt |\1=0 Pn n Pn 


Using the definition of A we then compute: 


d nol 1 \“ 1 1 1 \< 
eX -—| ee XHt¥ =  exp{—x] a{=x,< y} exp( =X] 
pan n non 
1 n-l 


dt 
Baar) 


aoe —aey | (45) 
1 — e~ad(X) 
— [Jae Y)). 


To obtain the second equality we used the linearity of A in Y and the definition of 
the Adjoint representation: Ad(BXA) = B-A-B™'. For the third equality we 
used the formula for the sum of a geometric progression with factor Ad(e~*/"). 
For the limit we used the continuity of A in X, the fact that ad is the derivative of 
Ad (for the limit n — © in the denominator!), and that the exponential map 
intertwines ad and Ad: Ad(e*) = e*“*). Since an elementary calculation shows 
that A(0, Y) = Y, the theorem follows when we multiply by e*. 


t=Q) 


Remark. Readers who feel uneasy in taking the limit m — © need only check the 
following convergence of power series of a single complex variable z: 


StS Feet 5 [5 | oe ne 

n(e7/"—1) {=n m0 \ rao itn’? 0 (i+1)! = z 
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The Kantorovich Inequality 


Viastimil Ptak 


The inequality appears first in a survey article on functional analysis and applied 
mathematics by L. V. Kantorovic; it is used in investigations concerning the 
condition number of operators and has important applications in estimating 
convergence of methods of steepest descent for solving equations. In a number of 
subsequent papers the connection of the inequality with an inequality given by 
Pélya and Szego was cleared up and a number of proofs, some of considerable 
complexity, of the inequality and of different variants thereof appeared in the 
literature. 

In view of the importance of the inequality one more note on the subject might 
be of interest. It is not difficult to see that the result is essentially based on the 
inequality between the geometric and arithmetic mean; to emphasise this we 
restate it in a form using the two means which immediately suggests a simple and 
natural proof. 


The Kantorovich inequality. Suppose x, <x, < -*: <x, are given positive num- 
bers. Let ’,,...,A, 2 0 and LA, = 1. Then 


(A; x;)( LAjx;') <A’G~? 


where A = 3(x, +x,) and G = (x,x,)'”. 


Proof: Observe that the inequality is homogeneous in the sense that it is invariant 
with respect to replacing each x; by a positive multiple ax, Accordingly it is 
possible to assume that G = 1 so that x, = 1/x,. Each x between x, and 1/x, 
Satisfies 


1 1 


x x, 


It follows that LA;x; + DA; xj! <x, + 1/x, = 2A. The conclusion follows by an 
application of the geometric—arithmetic mean inequality. 
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On The Generalized Inverse Form 
of the Equations of Constrained Motion 


Robert Kalaba and Rong Xu 


1. INTRODUCTION. It has often been observed that seemingly abstract concepts 
and principles prove to be of paramount importance in practical applications. The 
close ties between constrained motion and generalized inverses of matrices may be 
a case in point. 

In 1829 C. F. Gauss formulated his celebrated principle of least constraint for 
handling static and dynamic problems for constrained mechanical systems. The 
principle takes the form of a minimization problem. The seemingly abstract notion 
of the generalized inverse of a matrix proves to be crucial in dealing with Gauss’ 
principle and in understanding the complex interactions between applied and 
constraint forces. 

In this note, we present Gauss’ principle and then indicate the role of general- 
ized inverses in its further development. 


2. GAUSS’ PRINCIPLE AND GENERALIZED INVERSES OF MATRICES [1]. 
Consider a system of p particles. Let the mass of the i" particle be m, and the 
external force acting on it be f;, We use Cartesian coordinates. If there were no 
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constraint on the particles, the acceleration of the i" particle would be a; = f;/m;j. 
In matrix form this is 


F = Ma, (1) 
where, with n = 3p, 
f, 
f 
F =|. |, (2) 
nX1 . 
f, 
m 0 0 0 0 0 0 0 0 
0 m, 0 0 0 O 0 0 O 
0 Om, 0 0 0 0 0 O 
0 0 0 m, 0 0 0 0 O 
0 0 0 0 m, 0 0 0 0 
M=!0 0 0 0 O om, 0 0 of G8) 
nxn 
0 0 0 0 0 , Oo 0 
0 0 0 m, 0 
0 0 0 m, 
and 
a; 
a 
a =|. |. (4) 
nxl ° 
ap 


However, if the particles are subject to constraints, a task of analytical mechanics 
is to determine the actual acceleration of the i" particle, x,, at any time ¢. Gauss’ 
principle of least constraint [2] states that the actual accelerations x,, X¥,,...,x 
minimize G where 


p 


G= dmx; — a)’ (x; — aj), (5) 


IMs 


I 


subject to whatever the constraints might be on the accelerations. As usual, 
superscript T denotes transposition. Eq. (5) is identical to 


G = (x — a)’ M(x — a), (6) 
where 
X, 
s 
g =]. |. (7) 
nxX1 ° 
Xp 


In Lagrangian mechanics, the constraints take the form 9,(t, x, x) = 0, where 
k = 1,2,...,m, in which m is the number of constraints. Through differentiation 
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with respect to t, this leads to the constraints on the acceleration, taking the form 
of a consistent linear algebraic system 


Ax = b, (8) 


where A is an m X n matrix and b is an m X 1 vector. Both A and b may depend 
upon the time ¢, the particles’ displacement vector x and the velocity vector x; and 
the matrix A need not be of full rank. Given initial conditions on x and x, use of 
the differentiated form of the constraints does not cause any loss in generality. In 
these cases, Gauss’ principle takes the form of minimizing G subject to the linear 
constraints (8). 

To recast Gauss’ principle, let 


y= M(x — a), (9) 
so that 
¥= Myra. (10) 
Consequently, Eqs. (6) and (8) are equivalent to 
G=y'y, (11) 
and 
AM 7!” y = b — Aa. (12) 


Gauss’ principle is then reduced to the problem of finding the shortest length 
vector y such that the consistent linear Eq. (12) is satisfied. As is known [3], the 
solution to this problem can be expressed by 


y = (AM~'”)" (b — Aa), (13) 


where (AM~!/?)* is the Moore-Penrose generalized inverse* of the matrix AM7!””. 


This is the only property of the generalized inverse that is used! Substituting 
Eq. (9) into Eq. (13) and rearranging it, we obtain the actual acceleration vector x 
in the form 


%=a+M7!/2(AM7!/2)" (b— Aa). (14) 


3. AN APPLICATION. Let us use a simple example to illustrate the applicability 
and utility of the formula given by Eq. (14). Suppose we would like to determine 
the equations of motion of a spherical pendulum with length / and mass m. The 
rectangular coordinates of the mass point are specified by (x,, x,, x3). The follow- 
ing is a general procedure. 

First, we identify the mass matrix and the free motion acceleration vector. In 
this case, they are 


m 0 0 
M=/]0 m Oj}, (15) 
3x3 0 0m 


*Given an m X n matrix B, there always exists a unique n X m matrix B*, called the Moore-Penrose 
generalized inverse of B, which satisfies the following four basic properties: 


BB*B=B; B*BB*=B* (BB*)’=BB*t; and  (B*B)’ =B‘B. 


A useful secondary property of B* is that z = B*c is the shortest length solution to the least squares 
problem Bz = c. 


1995] NOTES 823 


and 


0 
a =|-g\. (16) 
3x1 0 
Accordingly, 
m7 \/2 0 0 
M12=] 9g m2 0 |. (17) 
3x3 _ 
0 0 m7 \/? 


Second, we write out all the constraint equations. In this case, to keep the point on 
the sphere, the only constraint is 


xi txe tx =P. (18) 


Third, we get the linear restriction on the acceleration vector by differentiating 
once or twice the constraints with respect to ¢. In this example, one differentiation 
of the constraint Eq. (18) gives 


XX, +X 2X, +x;X, = 0, (19) 
and one more differentiation results in 
Xy¥, + x,¥) +.43¥, = —(47 + 43 + £3). (20) 


Notice that Eq. (20), together with initial conditions on the displacements and 
velocities, describes the same constraint on the system as Eq. (18). In matrix form, 
Eq. (20) is 


xy 
[x1 2 x5] ] % | = —(42 +43 + x3). (21) 
x3 
Let 
A =1{]X, XX X31, 22 
A=[n 2 5] (22) 
x, 
~ _ |x 
301 2 ’ (23) 
X3 
and 
b= ~ (47 + %3 +3). (24) 
Eq. (21) then is reduced to 
Ax = b, (25) 


which is the required form for the constraints in applying the basic formula (14). 
Since 


AM"? =m-7[x, x, x5], (26) 
1x3 
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it follows that 


(AM 1/2)" c (27) 
m 2 (x? + x3 +.x3) i, 


Lastly, we substitute Eqs. (16), (17), (27), (24) and (22) into Eq. (14). This leads to 
%=a+M~!/2(AM~!/?)” (b— Aa) 


0 1 xy 
_ _ eee _ { v2 +2 .2\ _ 
0 X3 
0 -2 2) 22 4 o2\]*1 
~ | 78) * x7 4+x3 4x2 *2 | (28) 
0 1 2 3 x3 


which can be verified to be the correct equation of motion for the spherical 
pendulum by using Lagrange’s equations of motion of the first kind or otherwise. 


4. CONCLUDING REMARKS, In this note, we have indicated the close connec- 
tion between two cultures that are normally viewed as being quite separate: that of 
analytical mechanics and that of generalized inverses of matrices. Contributions 
should flow freely from one to the other. In particular, there are pedagogical 
implications for the teaching of generalized inverses early in the undergraduate 
curriculum. Many analytical and computational aspects of the generalized inverse 
form of the equations of constrained motion can be considered now [4]. Automa- 
tion of the entire process, including machine evaluation of needed partial deriva- 
tives [5] and machine evaluation of the generalized inverses, should be both 
challenging and rewarding. 
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Off to the Races 


Jeffrey Ondich 


You are Seated in a restaurant, about to eat some soup. You pick up your spoon. 
At that moment, crashing dishes interrupt your concentration. While you look 
away, your hungry eating companion slips the spoon from your hand. A moment 
later, you continue your task where you left off, and plunge your hand into the 
soup. 

Most human beings, even mathematicians, would notice the absence of the 
spoon. But computer programs tend to be less adaptable than their authors. In this 
column, we will take a look at the trouble that can arise when programs that can 
be interrupted at any time try to share resources and communicate with one 
another. 


Processes and Interrupts. On a computer system with one central processing unit 
(CPU), only one program can actually run at a time. A pre-emptive multi-tasking 
system creates the illusion of many programs running simultaneously by forcing the 
programs to take turns. One of the many jobs of a computer’s operating system is 
to schedule processes. A process is a program in the midst of execution. At any 
given time, exactly one process is running—that is, its instructions are being 
executed by the CPU. Other processes are ready to run, and still others are 
blocked or asleep. A blocked process will not be given control of the CPU until 
some condition is met, such as the arrival of data the process requested from a 
disk. 

Once a process is given control of the CPU, how does any other process get 
control back? One way is for the running process to put itself to sleep. For 
example, if a process requests some data from a hard disk, the process will block 
while waiting, and it will be up to the operating system to wake the process up 
when the information arrives. If a process doesn’t give up the CPU voluntarily, it 
has to be forced out. Every so often, typically on the order of once per 60th of a 
second, a small quartz clock sends the CPU an interrupt (a term that beautifully 
exemplifies computer scientists’ fondness for using a verb when a perfectly good 
noun is available). The interrupt causes the CPU to save the running process and 
give control to the operating system, which may then give control to any process 
that is ready to run. 


Race conditions. Multi-tasking allows computer users to run many programs at 


once, and allows the CPU to spend most of its time doing useful work. But the 
pre-emption and inter-process communication that often come with multi-tasking 
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can cause trouble, too. Consider the bakery in the restaurant where you had your 
soup. Every day, the bakery sells 100 chocolate cakes by taking orders via the 
Internet. When a purchase order comes in, a new process is created to handle the 
order. All such processes have access to a variable named cakesSold, which is 
initialized to 0 every morning. This is what these processes do: 


If cakesSold < 100, 

Add 1 to cakesSold 

Record the transaction 

Send the order to the delivery truck. 
Otherwise, 

Ask the customer to try again tomorrow. 


Now suppose cakesSold = 99, and two orders arrive in quick succession, 
prompting the creation of processes A and B. Consider the following sequence of 
events. 


. Adetermines that cakesSold < 100 
An interrupt occurs, and B gets control 
B determines that cakesSold < 100 

. Badds 1 to cakesSold 

. B finishes its work, and A gets control 

. A adds 1 to cakesSold... 


Nm PWN 


The value of cakesSold is now 101, so someone is going to have to bake another 
cake. Process A has, so to speak, failed to notice its spoon is missing. The problem 
here is that it takes more than one step to test and alter cakesSold, but interrupts 
can occur any time, leading to race conditions—trouble caused by two processes 
racing to use a shared resource. 


Semaphores. To correct the problem with the cakes, we need some way to give 
Process A exclusive access to cakesSold for a short time. One way to do this 
involves semaphores, first suggested by Edsger Dijkstra in [1]. A semaphore has a 
non-negative integer value and a collection of processes, all blocked, that are said 
to be “sleeping on the semaphore.” Initially, a semaphore has no processes 
sleeping on it. Any process can perform either of two operations on a semaphore: 


DOWN: If the semaphore’s value is greater than 0, the value is decreased 
by 1. If the value is 0, the value remains 0, and the process performing the 
DOWN goes to sleep and gets added to the semaphore’s collection of 
sleeping processes. 


UP: If the semaphore’s value is 0 and its collection of sleeping processes 
is non-empty, one of those sleepers is awakened. Otherwise, the semaphore’s 
value is increased by 1. 


DOWN and UP can be built into the operating system in such a way that they 
are atomic. That is, like a single machine language instruction, they cannot be 
interrupted. 

Let’s see how a semaphore can be used to solve our cake problem. We will 
follow operating systems literature tradition and call our semaphore “mutex,” in 
reference to the mutually exclusive use of cakesSold we are trying to enforce. We 
will set the value of mutex to 1, which will mean that no process is using cakesSold 
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at the moment. Our order-handling procedure now looks like this: 


DOWN(mutex) 
If cakesSold < 100, 

Add 1 to cakesSold 

UP (mutex) 

Record the transaction 

Send the order to the delivery truck 
Otherwise, 

UP(mutex) 

Ask the customer to try again tomorrow 


Suppose A and B are using this new semaphore-based approach, and B interrupts 
A as before: 


. A does a DOWN on mutex, whose value becomes 0 
A determines that cakesSold < 100 

. An interrupt occurs, and B gets control 

. B does a DOWN on mutex, and thus goes to sleep 

. A gets control, and adds 1 to cakesSold 

. Adoes an UP on mutex, waking B up 


APR WNP 


Even if B takes control again now, the danger is past. Process A had exclusive 
access to cakesSold long enough to test it and add 1. 


Sending Messages. Semaphores are easy to misuse, and they do not provide a very 
abstract way of thinking about interprocess communication. A more comfortable 
abstraction is the notion of message passing. We would like process A to be able to 
send messages to and receive messages from process B. Using a message passing 
system, the cake problem could be solved by requiring processes to request 
permission from one another before altering cakesSold. Such an approach would 
be metaphorically more pleasing than the use of semaphores, and thus more 
reliable. 

If our operating system provides us with semaphores, we can use them to 
implement a message passing system. In a sense, we can use the semaphores as a 
programming language, and write message-passing programs in that language. 

Suppose for simplicity that only two processes, still called A and B, will be 
sending messages to one another. We will set aside memory, shared by A and B, to 
hold a “mailbox” for each process. Each mailbox will have enough space to hold 
exactly one message. We will also associate with A’s mailbox two semaphores, 
called fullA and emptyA, whose values will initially be 0 and 1, respectively. B’s 
mailbox will have two similarly named semaphores. From these pieces, we can 
construct the basic operations SEND and RECEIVE. 

When B wants to SEND a message to A, here is what B will do: 


DOWN(emptyA) 
Write the message into A’s mailbox 
UP(fullA) 


To RECEIVE a message from B, A does the following: 


DOWN(fullA) 
Read the message in A’s mailbox 
UP(emptyA) 
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Any time before B has finished SENDing, fullA will have value 0, so A will go 
to sleep if it tries to RECEIVE. Similarly, if B tries to SEND a second message 
before A has RECEIVE’d the first, B’ss DOWN on emptyA will put B to sleep. 
Thus, the problem of B and A trying to SEND and RECEIVE at the same time 
will not take place, nor will the problem of B SENDing too many messages for A’s 
mailbox to hold. 

Even though semaphores and message-passing systems can prevent race condi- 
tions, they can also be misused, causing a related problem called deadlock. In the 
restricted, two-process world of the examples above, both A and B might invoke 
SEND twice before either invoked RECEIVE, or both might perform a DOWN on 
a semaphore whose value was already zero. Either way, A and B would be asleep, 
each waiting for a wake-up call from the other. 


Equivalence. Semaphores and message passing are not the only abstractions 
available for regulating interprocess communication. There are event counters, 
monitors, rendezvous, sequencers, path expressions, serializers, and more (see [2] for 
details). Each is just the right abstraction for some communication task. But, as it 
turns out, they are all equivalent. 

We have already seen how semaphores can be used to implement a message 
passing system. To show that semaphores and message passing are equivalent, we 
need to construct semaphores from a message passing system. 

Suppose we have atomic operations SEND and RECEIVE for message-passing, 
but our operating system provides us with no semaphores. To mimic semaphore 
behavior, we can create a special process—the semaphore Boss, if you will—to 
coordinate calls to UP and DOWN. For each semaphore, the Boss will maintain a 
value and a list of sleeping processes. 

To perform an UP on a semaphore S, process A will SEND to the Boss a 
message containing the identity of the semaphore in question and a marker to 
indicate that A wants to do an UP. (To do a DOWN, A sends the Boss the same 
message with a different marker.) Then A invokes RECEIVE, and waits for a 
return message from the Boss. The content of the return message is irrelevant; 
what matters is that A will sleep until the message arrives. 

The following procedures describe the actions of the Boss upon receiving a 
DOWN(S) message from A: 


If the value of S is greater than 0 
Subtract 1 from the value of S 
SEND a message to A 
Otherwise, 
Add A to the list of processes sleeping on S 


and an UP(S) message from A: 


If the value of S is greater than 0 
Add 1 to the value of S 
SEND a message to A 

Otherwise, 
Remove one of the sleepers from the list 
of processes sleeping on S, and send that 
process a message 


All semaphore operations are performed while the Boss is running, and so the 
unpredictable order of events that can lead to race conditions will not be a 
problem here. 
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References. Mutual exclusion, semaphores, message passing, and deadlock are 
discussed in virtually every elementary operating systems textbook. Two good 
examples are [5] and [4]. 

Dijkstra’s original paper [1] is worth reading to understand the historical context 
in which semaphores were proposed. If your operating system doesn’t even give 
you semaphores, you can still enforce mutual exclusion using the CPU-hogging 
technique of busy waiting (see [3]). 
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Remote from human passions, remote even from the pitiful facts of 
nature, the generations have gradually created an ordered cosmos, 
where pure thought can dwell as in its natural home and where one, 


at least, of our nobler impulses can escape from the dreary exile of 
the actual world. 


~~ Russell 
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In recent years, elliptic curves have played a leading role in number theory, most 
famously in Wiles’ program to prove Fermat’s last theorem. However, since these 
developments are highly technical, it may be useful to look back to earlier times, 
when elliptic curves led ‘a simpler life. For about 1500 years, from the time of 
Diophantus to Newton, elliptic curves were known only as curves defined by 
certain cubic equations. This put them just a step beyond the conic sections, and 
some of their geometric and arithmetic properties can in fact be viewed as 
generalisations of properties of conics. In particular, it is possible to find rational 
solutions of both quadratic and cubic equations by simple geometric constructions. 

It was only with the development of calculus, in the 17th century, that sharp 
differences between conics and elliptic curves began to emerge. Conic sections can 
be parametrised by rational functions. For example, the circle x? +y? =1 is 
parametrised by 


1 —t? 2t 
ter” te 


but the elliptic curves cannot. Their simplest parametrising functions are elliptic 
functions, which arise in calculus as the inverses of elliptic integrals, so-called 
because a typical example is the integral for the arc length of the ellipse. It is for 
this fairly accidental reason that they are called elliptic curves—an unfortunate 
accident since the ellipse itself is not an elliptic curve. 

The difference between conics and elliptic curves was “felt” in the 17th century 
in the apparent intractability of elliptic integrals, though the parametrisation of 
cubic curves was not known at that time. The idea of inverting elliptic integrals to 
create elliptic functions had to wait until the early 19th century. The nonrationality 
of elliptic curves was not fully understood until the mid-19th century, when the 
introduction of complex coordinates revealed a topological difference between 
them and conics. This brings us within sight of the modern view of elliptic 
curves—a remarkable synthesis of number theory, geometry, algebra, analysis and 
topology. In what follows I shall attempt to describe what led up to this state of 
affairs. 


Diophantus. Very little is known about Diophantus except that he lived sometime 


between 150 AD and 350 AD and was a wizard at finding rational solutions to 
polynomial equations in two or more variables. His Arithmetica (available in the 
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English edition of Heath [4]), contains the solutions of hundreds of equations, 
among them the following instructive examples. 

1. A rational solution of x? + y? = 16, other than an obvious one such as x = 0, 
y = 4, is found by solving the simultaneous equations 


x? +y* = 16, 
y=2x—4, 


which yield the solution x = 16.5, y = 12/5 (Heath [4], p. 145). 
2. A rational solution of x? — 3x? + 3x + 1 =y’, other than the obvious one 
x = 0, y = 1, is found by solving the simultaneous equations 


xe —3x*+3x4+1l=y’, 
y= 3x +1, 


which yield the solution x = 21/4, y = 71/8 (Heath [3], p. 242). 

How did Diophantus choose the linear equations in these two examples? The 
simplest explanation is geometric, although he makes no mention of geometry. 

In the first example the linear equation represents a line through the “obvious’ 
rational point (0, 4). Its slope is not important, since any line through (0, 4) with 
rational slope t will meet the circle at a second rational point (8¢/(1 — 7), 
(4t? — 4)/(1 +t”). Conversely, all rational points on the circle are obtainable in 
this way, so Diophantus has essentially parametrised the rational points on the 
circle by rational functions of a rational parameter f¢. 

The linear equation in the second example has an even stronger geometric 
smell. It is the tangent to x? — 3x? + 3x +1 =y?’ at the “obvious” rational point 
(0,1). Here there is no option about the slope because a line has to meet a cubic 
curve in two rational points for its third intersection to be rational. When only one 
rational point is known, this forces us to use the tangent, which is the line through 
two “coincident” points. 

It is possible, of course, that Diophantus discovered these facts purely alge- 
braically, and did not notice their geometric interpretation. However, that would 
be a truly amazing departure from the Greek mathematical culture of his time. 
Even in the more algebraic culture of the 17th century, Fermat and Newton 
immediately recognised Diophantus’ work as geometry, with Newton [6] explicitly 
interpreting Diophantus’ solutions as chord and tangent constructions. Later 
discoveries added more weight to the geometric interpretation, as we shall see 
below. 


bd 


Fermat and Newton. Fermat was the first mathematician to make significant 
progress in number theory beyond Diophantus. Among his many discoveries were 
methods for proving nonexistence of integer or rational solutions for certain 
equations. For example, he proved that there are no positive rationals a, b,c such 
that 


This implies in particular that no positive integer fourth powers sum to a fourth 
power (the n = 4 case of Fermat’s last theorem), but it is also a statement about an 
elliptic curve. It says that there are no nontrivial rational points on the curve 


yre=1-x'‘, 
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since a rational point (p/r,q/r) with p,q # 0 and 
p q 


—=]—-— 
r2 r4 


gives nonzero integers a = r,b = q,c = pr with a* — b* =c’. 
Now I know I said that elliptic curves are cubics, but they are cubic in a suitable 
coordinate system. Any quartic curve of the form 


“= (x — a)(x — €)(x— y)(x - 8) 


can be rewritten 


2 B-—a y-a@ 5-—a 
(aa) = (baat a) a) 
x-—a@ x-—a x-—a x-a 
and hence it is cubic in the coordinates 


Y= d 7: 
x —-@Q@ Xx —-@ 
In particular, y> = 1—x‘* is a cubic Y? = 4X° — 6X? + 4X — 1 in the coordi- 
nates X = 1/(1 —x),¥=y/(1 —x)*. Notice that this is an appropriate coordi- 
nate change from the point of view of number theory, because it makes the 
rational points (x, y) on one curve correspond to the rational points CX, Y) on the 
other. Such a coordinate change is called birational. 
Newton made the surprising discovery that all cubic equations in x and y can 


be reduced to the form 


x= 


Y?=X?+aX+b 

by a birational coordinate transformation. In fact, the transformations he used 
were simply projections. He called this “genesis of curves by shadows”. His result 
can be viewed as an analogue of the well known theorem that second degree 
curves are conic sections and hence, in nondegenerate cases, projections of the 
circle. The degenerate cubic curves are those for which the right hand side 
X? + aX + b has a repeated factor. The corresponding repeated root X = a is 
either a double point (Fig. 1) or cusp (Fig. 2) of the curve, and by drawing a line of 
slope ¢ through this point we obtain the coordinates of the general point on the 
curve as rational functions of ¢. 


Figure 1. Cubic with double point. 
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Figure 2. Cubic with cusp. 


Figure 3. Nonsingular cubic. 


The curves for which X*+aX+b has no repeated factor cannot be 
parametrised by rational functions, and are what we now call elliptic curves 


(Fig. 3). 


Elliptic integrals. Early in the development of integral calculus, mathematicians 
encountered the problem of “rationalising” square roots of polynomials. For 
example, to find the area or arc length of a circle one finds an integral involving 


1—x*. This can be rationalised by the “Diophantine” substitution x = 
(1 —¢t?)/1 +4’, and fact Jakob Bernoulli [1], in a similar situation, actually 
attributed the substitution to Diophantus. He used it to obtain the expression 


T 1 at 
4 o 1+t?’ 
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whence he obtained the famous Series 


7 1 1 1 
4 ; 3 r 5 7 r 
by expanding 1/(1 + t”) in a geometric series and integrating term by term. 

Integrals involving square roots of cubic or quartic polynomials proved more 
intractable. They were called elliptic integrals because one of them expresses the 
arc length of the ellipse. Cubics and quartics were lumped together because of 
birational equivalences between them, as noted above for y* = 1 — x‘ and Y? = 
4X>-—6X*+4X- 1. Such integrals arise from a great number of natural 
geometric and mechanical problems, so a lot of effort was expended on them, but 
without success. 

Perhaps the first to see why rationalisation might be impossible was Jakob 
Bernoulli [2], who noted that a rationalisation of V1 — x*, at least by a rational 
function x = f(t) with rational coefficients, would violate Fermat’s theorem on the 
nonexistence of positive integer solutions of a* + b* = c’. In fact, it can be shown 
that V1 — x* cannot be rationalised by any rational function x = f(t), by repeat- 
ing Fermat’s argument with polynomials in place of integers, so Jakob Bernoulli 
was on the right track. However, this type of argument was not used until the 19th 
century, so the nature of elliptic integrals remained unclear until then (when ideas 
not only from number theory, but also from analysis and topology, were directed at 
the problem). 


Elliptic functions. In the 1820s, Abel and Jacobi finally saw what to do with 
elliptic integrals—ZJnvert them. Instead of studying the integral 


“1 ’ f dt 
u=g (x)= | —, 

0 vt°+at+b 
say, study its inverse function x = g(u). The gain in simplicity is comparable to 
studying the function x = sin u instead of the integral sin~' x = /<(dt/ V1 — t?). 
In particular, instead of a multi-valued integral g~'(x), one has a periodic function 
x = g(u). 

The difference between sin u and g(u) is that the periodicity of g(u) cannot be 
properly seen until complex values of the variables are admitted, at which stage it 
emerges that g(u) has two periods. That is, there are nonzero w,, w, € C, with 
w,/w, € R, such that 


g(u) =g(u t+ w) =g(u + a). 
The two periods can be brought to light in various ways. One method, originating 


with Eisentein [1847] and commonly used today, is to write down a function that 
obviously has periods w, and w,, namely 


1 
a(u) 2s (u+ma, +nw,)°’ 
and derive its properties by manipulation of infinite series. Eventually one finds 
that g-'(x) is an integral of the type we started with. 

A more insightful approach though harder to make rigorous, is to study the 
behaviour of the integrand 1/ vt? + at + b as t varies over the complex plane. 
Following Riemann [7], and viewing the 2-valued “function” 1/ vt? + at + b asa 
2-sheeted surface over C, one finds that there are two independent closed paths of 
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integration, over which the integrals are w, and w,. This accounts for the periods 
w, and w, of the inverse function g(u). 
Since g(u) = x, it follows by basic calculus that 
dx 1 1 


(uy Se Ot at db =y, 
a4) du du/dx = 1/yx>+ax+b 


so x = g(u), y = g’(u) gives a parametrisation of the curve y? = x? + ax + b. With 
a little more work it can be shown that u > (g(u), g'(u)) is in fact a continuous 
One-to-one correspondence between C/(w,, w,) and the curve. C/(@,, w,) is 
the quotient of C by the subgroup generated by w, and w, and is topologically a 
torus, hence so is the curve y* =x? + ax +b. This is the deeper reason why 
elliptic curves are not rationally parametrisable—a curve parametrised by rational 
functions x = p(u), y = q(u) is the topological image of the completed plane 
C U {x} of u values, and C U {x} is topologically a sphere. 

Another consequence of the parametrisation x = g(u), y = g'(u) is that the 
curve y* =x°> + ax +b is an abelian group. The “sum” of points with parameter 
values u,,u, is simply the point with parameter value u, + u,. Under this 
definition of sum, the curve is isomorphic to the group C/(w,, w,). Amazingly, 
there is an equivalent definition of the sum that Diophantus would have under- 
stood (and which helps to explain why elliptic functions are useful in number 
theory): the sum of the points P, and P, is simply the reflection, in the x-axis, of 
the third point on the curve collinear with P, and P, (Fig. 4). For an explanation 


Figure 4. Addition of points on an elliptic curve (from Koblitz [5]. 
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of this face we must refer the reader to a recent book on elliptic curves, such as 
Koblitz [5]. In the same book you will find many beautiful modern results on 
elliptic curves, motivated by ancient problems in number theory and geometry. 


REFERENCES 


1. Bernoulli, Jakob (1696) Positionum de seriebus infinitis pars tertia. Werke, 4, 85-106. 

Bernoulli, Jakob (1704) Positionum de seriebus infinitis ... pars quinta. Werke, 4, 127-147. 

3. Eisenstein, G. (1847) Beitrage zur Theorie der elliptischen Functionen. J. reine angew. Math. 35, 
137-274. 

Heath, T. L. (1910) Diophantus of Alexandria, Cambridge University Press. 

Koblitz, N. (1985) Introduction to Elliptic Curves and Modular Forms, Springer-Verlag, New York. 
Newton, I. (late 1670s) De resolutione quaestionum circa numeros. Math. Papers 4, 110-115. 
Riemann, G. B. H. (1851) Grundlagen fiir eine allgemeine Theorie der Functionen einer 
veranderlichen complexen Grosse. Werke, 2nd ed., 3-48. 


N 


IAMS 


Department of Mathematics 
Monash University 

Clayton 3168 

AUSTRALIA 
stillwell@monash.edu.au. 


Without the concepts, methods and results found and developed by 
previous generations right down to Greek antiquity one cannot 
understand either the aims or the achievements of mathematics in 


the last fifty years. 


—H. Weyl (in 1950) 
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PROBLEMS AND SOLUTIONS 


Edited by: 
Richard T. Bumby, Fred Kochman and Douglas B. West 


Proposed problems should be sent to the MONTHLY PROBLEMS address given on the 
inside front cover. Please include solutions and relevant references. Three copies 
of all items needed to evaluate the problem should be sent. 


Solutions of published problems should arrive at the MONTHLY PROBLEMS address 
given on the inside front cover before April 30, 1996. If possible, solutions should 
be typed with double spacing. Two copies suffice. Several solutions may be mailed 
together, but they should be on separate sheets of paper. The problem number and 
the solver’s name and mailing address should appear on each solution. A mailing 
label should be included if an acknowledgment is desired. 


The published solution is likely to be based on a solution that-is complete and 
correct. Additional information, such as references to other appearances of the 
problem or its solution, is also welcome. 


An asterisk (*) after the number of a problem, or part of a problem, indicates that 
no solution is currently available. 


PROBLEMS 


10480. Proposed by Shmuel Rosset, Tel Aviv University, Ramat Aviv, Israel. 


The standard proof of Cayley’s theorem shows that S,,, the symmetric group of degree 
n, contains as subgroups every group of order n. Which groups of order n are contained in 
An, the alternating group of degree n? 


10481. Proposed by Frank Schmidt, Arlington, VA. 


Let f(n) denote the number of n by n matrices whose entries are 0 or 1 that are positive 
semi-definite. 


Let g(n) denote the number of n by n matrices whose entries are 0 or 1 that are positive 
definite. 


Evaluate f(n) and g(n). 
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10482. Proposed by Emre Alkan (student), Bosphorus University, Istanbul, Turkey, and 
Murray S. Klamkin, University of Alberta, Edmonton, Alberta, Canada. 


Given a regular n-gonal pyramid with apex P and base A; Az... A,, denote ZA; PAj+ 
by a with0O <a < 2n/n. If points B; are chosen on the rays PA, (¢ = 1,2,...,n), 
determine the maximum and minimum values of 

[PB,|+|PBo|+...+|PB,| 
|B, Bo} + |BoB3|+...+/Bn Bil 


10483. Proposed by Stanley Rabinowitz, Westford, MA. 


Given an odd positive integer n, let Aj, A2,..., A, be a regular n-gon with circumcircle 
I. Acircle O, with radius r is drawn externally tangent to at A, fori = 1,2,...,n. Let 
P be any point on I’ between A,, and A;. A circle C (with any radius) is drawn externally 
tangent to I’ at P. Let ¢; be the length of the common external tangent between the circles 


C and O;. Prove that 
n 
Yo(-1)'t; = 0. 
i=l 


10484. Proposed by N. Bebiano and J. da Providéncia, Universidade de Coimbra, Coimbra, 
Portugal. 


Letn > 3, and leta = (a|,...,q@,,) and B = (B),..., B,) be complex row vectors such 
that {a},...,@,, B1,..-, By} are all distinct. Consider the n! complex numbers (counting 
multiplicities) 

n 


ia = I] (a; —_ Botiy ) 


i=] 


where o runs through all permutations of {1,..., 2}. Let P(a, 6) denote the convex hull of 
the z,. Prove that P(q, B) is a line segment if and only if all the a; and B; lie on a common 
circle or straight line. 


10485. Proposed by David Bradley, Simon Fraser University, Burnaby, B. C., Canada. 


Find the real numbers r that satisfy the equation 


i dx _| 
go U+x") 


10486. Proposed by Joseph H. Silverman, Brown University, Providence, RI. 


forr € Rwithr > 1. 


Leta,b > QO anda > 1 be real numbers, and define a function 
| 


Z = —_—_____—____ 
is) (aa”" + ba-" ) 


neZ 
fors € C, Ns) > 0. 
(a) Prove that Z(s) has a meromorphic continuation to all of C. 
(b) Find the poles of Z(s). 
(c) Find the residues of Z(s) at its poles. 
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SOLUTIONS 


Positive Deformations of the Cauchy Matrix 


10265[1992, 957]. Proposed by Bjorn Poonen (student), University of California, Berkeley, 
CA. 


Let aj,...,@n, 5, ..., bn, a be real numbers with b;,..., 5, and q@ all positive. Prove 


n 


n a;a; 
LLG +H > 


i=1 J=1 


Solution I by Donald A. Darling, Newport Beach, CA. Let yu be a positive measure on 
[O, oo), and let its Laplace transform f(s) have an abscissa of convergence so: f(s) = 
fe u(dt), s > so. If bj > so,i = 1,2,...,n and q; is real, i = 1,2,...,n, then 


S> aia; fbi +b) = Yo aia, fe ~Oi+Oi)" (dt) 


i,j=l i,j=l 


h 2 
=| (>: ae) (dt) 
> 0. 
In this problem, pz(dt) = Tet! dt and so = O. 
Solution II by David G. Cantor, Del Mar, CA. Note first that it suffices to consider the 
case in which the 5; are distinct, say b} < bz < --- < by, since for every fixed choice of 


a1, 42,..., Qn, the given expression is continuous in the 5;. 
We first prove 


Lemma. Suppose that u;, U2, ..., Un are real numbers, not all zero. Then, the function 
f(x)=) ~~, 
j=l (bj + x) 


has at mostn — 1 positive zeros x. 


Proof. Use induction on n. The result is clear when n = 1. Suppose that n > 1 and put 


y n—-1} b, +x a 
g(x) = (bn + x)" f(x) = Cp +ro(FS) . 


Then 
n—1 a-—l 
bn +x b, — b; 
bj +x + x (b; + x) 
on —; 


= —a(b, + x)*7! uj; ——————__ 
2 "(bj +x)" 
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The inductive hypothesis tells us that g’(x) has at most n — 2 positive zeros, so g(x) and 
f (x) have at most n — 1 positive zeros. This completes the proof of the lemma. 

We turn now to the main result. First note that if A is then by n matrix whose (7, /) entry 
is1 / (b+ b;)", then det A + 0. Indeed, if a nonzero linear combination of its columns 
were Zero, Say 

n 
> _ —0O 
j=l (b; + 6; ) 


for 1 <i <n, then the function f(x) of the lemma would have n zeros bj, b2,..., by. 
This determinant is a nonzero continuous function of its parameters as long as they satisfy 
the specified inequalities. Thus, the sign for all a > 0 can be found by looking at a = 1. 
In this case, A is a Cauchy matrix whose determinant is 


[Tics TTjsi+s (2% — 4) (61 — 4)) 
[Tins TTjan (1 + 4;) (bi + 4) 
which is positive. 
This shows that that the determinant of the given quadratic form in the a; and all of its 
principal subdeterminants are positive. A standard criterion for a symmetric matrix to be 
positive definite (see F. R. Gantmacher, The Theory of Matrices, Chelsea, 1960, vol. I, 


Theorem X.3, p. 306 or Roger A. Horn & Charles R. Johnson, Matrix Analysis, Cambridge, 
1985, Theorem 7.2.5, p. 404) shows that 


non aja; 
~~» (b; + b;)@ > 0 
i=1 J=1 ! J 
Editorial comment. Frank Schmidt noted that the solution could be extracted from Lem- 
mas 5 & 6 of R. Bapat, “Multinomial probabilities, permanents and a conjecture of Karlin 
and Rinott”, Proc. Amer. Math. Soc. 102 (1988), 467-472. These results may be summa- 
rized by the following statement. 


Theorem. If B = (bj; ) is symmetric with real positive entries and precisely one positive 
eigenvalue, then (1 / b?. ) is positive semidefinite for alla > 0 


To apply the result in this case, take B = (bj + bj). The matrix B has rank at most 2, 
allowing easy analysis of its spectrum. 

The GCHQ Problem Solving Group gave an explicit computation, valid for general values 
of a > OQ, that is similar to the inductive proof of the characterization of positive definite 
matrices quoted in Solution II. 

All other correct solutions were similar to Solution I, though usually restricted to the 
special values of jz and so given at the end of that solution. 

Solved also by R. J. Chapman (U. K.), N. D. Elkies, P. J. Fitzsimmons, G. Letac (France), O. P. Lossers (The 
Netherlands), A. D. Melas (Greece), R. Mercer (Canada), F. Schmidt, A. Tissier (France), L. Wertheim (student, Russia), 


GCHOQ Problem Solving Group (U. K.), Western Maryland College Problems group, and the proposer. Four incorrect 
solutions were received. 


The Tarry-Escott Problem 
10284[ 1993, 185]. Proposed by Liang-shin Hahn, University of New Mexico, Albuquerque, 


NM. 
For each positive integer /, show that there exists a positive integer n and a partition of 
{1,...,} as a disjoint union of two sets A and B, such that for 1 <i <1, 
Tai=yo 
acA beB 
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Solution by Hillel Gauchman and Ira Rosenholtz, Eastern Illinois University, Charleston, 
IL. We prove a more precise statement: For each non-negative integer /, there is a partition 


of {0,1,...,2'+! — 1} into sets A), By such that 
Yiai=) ob! for0<j<l. 
acA; be B, 


We use the convention 0° = 1 so that )>.-¢ c? = |C| even if 0 € C. 

Let Ao = {1} and Bo = {0}, and define Ana) = Ay U (2"+! + B,) and Bry, = 
B, U (2"+! + An), wherex +C ={x+c:c€C}._ Trivially, Ao, Bo have the desired 
property for / = 0. Proceeding by induction, we suppose that A,, B, have the desired 
property for / = n. Given an integer j withO < j <n +1, we compute 


> al — >> b= 


acAns1 be Bn+1 
_ (pe +b - ar +a) : (> >> «) 
beB, acAn beB, acAn 
_ > ({)anme(D pk — > ) _ (> bi — > «) 
k=0 beB, acAn beB, acAr 


By the induction hypothesis, all terms are 0 when 0 < j < n. When j = 7 +1, the 


formula reduces to ()°peg, b"*' — Yoaea, a"*') — (Mew, O"t! -— Yue, 2" *"), which 
also equals 0. 


Editorial comment. As Richard Guy commented, this is the Tarry-Escott problem, men- 
tioned in L. E. Dickson, History of the Theory of Numbers, Chelsea, 1971, volume 2, 
709-710. Also, A. E. Caicedo Niinez & J. C. Vera Lizcano located it as MONTHLY Problem 
E1312 [1958, 284; 1958, 776] by C. F. Pinzka. They also found it in Loren Larson, Prob- 
lem Solving Through Problems, Springer, 1983, 163-164. Raul A. Simon found it in Joe 
Roberts, Elementary Number Theory, MIT, 1977, p. 88, 110S-111S. 

The Tarry-Escott problem has a huge literature. All solvers showed that 2”*! numbers 
suffice to provide simultaneous equalities up to the nth power. G. Myerson noted that an en- 
tire book has been devoted to the subject (A. Gloden, Mehrgradige Gleichungen, Noordhoff, 
1944). Several solvers noted that the result extends to three or more sets of integers. This 
is Prouhet’s problem, which preceded the Tarry-Escott problem, as E.M. Wright observed 
in “Prouhet’s 1851 solution of the Tarry-Escott problem of 1910,” this MONTHLY 66(1959), 
199-201. 


Solved also by 38 readers and the proposer. 


A Recurrence Related to Counting Involutions 


10347([1993, 951]. Proposed by T. S. Nanjundiah, University of Mysore, Mysore, India. 
For integer n > 1, define real numbers R,, by 
R,; =1 Reap =14+— (k>=>1). 


Prove that 


forn > 1. 


Editorial comment. All solutions followed the outline below. 
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Leta, = (1+ V/4n+1) / 2. Since f,(x) = 1+./x is a decreasing function, the 
result will follow by induction from a, = f,(a@,) and ay41 > fi(@n—1). These results are 
easily verified. 

In fact, as noted by Frank Schmidt and the National Security Agency Problems Group 
(independently), this result can be found in Leo Moser & Max Wyman, “On solutions of 
x? = 1 in symmetric groups”, Canadian J. Math. 7 (1955), 159-168. 

John Gaisser and Jonathan Sorenson (jointly) suggested a generalization in which R; is 
arbitrary and subsequent A, are defined by 


Then, for R; > 1, one has 


forn > 1. They conjecture that a similar conclusion holds for Rj < 1, provided that 
one takes n > no(R}), and they give the following sample values of no: no(9/10) = 4, 
no(1/2) = 32, no(1/5) = 410. 

H.-J. Seiffert wrote Ry in the form px/pz—; with po = p; = 1 and the p, satisfying the 
recurrence px41 = pr + Kpz—1. Again, an inductive argument is used. The recurrence for 
the px allows one to obtain a generating function 


OO Pk 5 
k=0 


This approach is also considered in the paper of Moser and Wyman cited above. 


Solved by 56 readers and the proposer. 


Collaborating editors: David F Appleyard, Paul T. Bateman, Duane M. Broline, Barry 
W. Brunson, Frank S. Cater, Gulbank D. Chakerian, Underwood Dudley, Gerald A. Edgar, 
Michael A. Filaseta, Ira M. Gessel, Richard A. Gibbs, Jerrold R. Griggs, Douglas A. Hensley, 
John R. Isbell, Mourad E. H. Ismail, Murray Klamkin, Daniel J. Kleitman, Frederick W. 
Luttmann, Frank B. Miles, Richard Pfiefer, Stephen L. Portnoy, J. O. Shallit, John Henry 
Steelman, Kenneth B. Stolarsky, David E. Tepper, Douglas B. Tyler, Daniel Ullman, and 
William E. Watkins. 


I have often observed ... that among some of the most capable, 
research-wise of new Ph.D.s can often be found the greatest lack of 
knowledge concerning the background and significance of their work, 
as well as abysmal ignorance of the reasons for doing it and of 


the general nature of mathematics. In fact, they are uneducated 
specialists. 
—R. L. Wilder (1972) 
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Polygonal Rooms Not Hluminable from 
Every Point 


George W. Tokarsky 


1. INTRODUCTION. Imagine two people in a dark room with many turns and 
cul-de-sacs. Assuming that the walls, floors and ceilings are constructed of reflec- 
tive material, can one person strike a match and be seen by the other after 
repeated reflections, no matter where the two are located? 


Figure 1. Can a match lit somewhere along the dotted line at Ag say be seen somewhere along the 
dotted line at A, say? 


This problem has been attributed to Ernst Straus in the early 1950’s, and has 
remained open for over forty years. It was first published by Victor Klee in 1969 
[1]. It has since reappeared on various lists of unsolved problems, notably Klee 
again in 1979 [2] and in two recent books on unsolved problems, one by Klee and 
Wagon in 1991 [3] and one by Croft, Falconer and Guy, also in 1991 [4]. 

In this article, we will settle the above problem in the negative. We will as well 
give elementary techniques for constructing rooms, both in the plane and in 
three-space, which are not illuminable from every point. In particular, we will show 
that if the two people are located in a two-dimensional planar room as shown in 
Figure 2, then they cannot see each other. 


2. THE PLANAR PROBLEM. If G is a bounded simple polygonal region in the 
plane, is G illuminable from every point? In other words, if we view the sides of G 
aS mirrors, can a single light source placed at any point, illuminate or be seen at 
every other point of the room? The problem can equivalently be posed in terms of 
a billiard ball bouncing around a pool table. Is there a “pool shot” between any 
two points on a polygonal pool table? 

A light ray or pool ball reflects only at the sides of the room in such a way that 
the angle of incidence equals the angle of reflection. A light ray or pool ball that 
strikes a vertex is considered to end or be absorbed there. 
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Figure 2 


All paths or pool shots will be taken to be of non-zero length. 

The main idea to solving this problem is that any path in a polygon unfolds to a 
path in another polygon constructed from mirror images of the first. Conversely, 
the second path can be considered to fold up to the first. 


Example 1. Path ABCD in 3(a) corresponds to the straight line path ABCD in 
3(b). 


Figure 3 


Example 2. Path ABCDEF in 4(a) corresponds to the path ABCDEF in 4(b). 


Figure 4 
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3. SOLUTION. We will first need the following lemma. 


Lemma 3.1. In an isosceles right triangle ABC (with right angle at C), there do not 
exist any pool shots from A coming back to A. 


45° 


45° 


De 
w 
Q 


Figure 5 


Proof: We start by taking a lattice of mirror images of this triangular table and 
assigning integer coordinates to the vertices as shown below, with A at the origin. 
Vertices labelled A have even coordinates (2m,2n) and vertices labelled B or C 
all have at least one odd coordinate. A pool shot from A to A on the original table 
would unfold or correspond to a straight line segment joining A(0,0) to say 
A(2m, 2n) in the lattice. This segment then must pass through the point (m, n) [or 
(m/2,n/2) if both m and n are even, etc.] and thus must pass through a point 
labelled B or C. This means the pool shot would hit a vertex and be absorbed 
before returning to A. | 


2 SS eae 


\(G,1) (4,1) 


B A B A 
(0, 0) (1,0) (2, 0) (3, 0) (4,0) 


Figure 6 


Theorem 3.2. There do not exist any pool shots from Ay to A, on the table shown in 
Figure 2. 


Proof: This table is constructed by taking mirror images of a right angled isosceles 
triangle as shown in Figure 7. The key to the diagram and the proof is that any 
point labelled B or C must be a vertex of this table, while points labelled A do not 
have to be. 
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Figure 7 


If there were a pool shot from A, to A,, the initial path must pass through the 
interior of one of the eight triangles surrounding Aj. Let us call this triangle 7. As 
in the lemma, a pool shot from A, to A, would correspond or fold up to a pool 
shot from A, to A, in triangle 7, which is impossible. = 


Incidentally, it should be clear from the proof that there does not exist a pool 
shot between any two points labelled A on this table. 


4. OTHER TABLES 


Example 3. It would be interesting to find the table with the least number of sides 
which is not illuminable from every interior point. The example below has 26 sides. 


Figure 8 


Example 4. By using the same kind of lattice argument given for the isosceles right 
triangle, there do not exist any pool shots from a corner of a square pool table 
ABCD coming back to itself. We can also get this result by observing that a square 
is the mirror image of a right isosceles triangle in its hypotenuse and that a path in 
the square folds to a path in the triangle. A square then can be used to construct 
tables with only right angles, one of which is shown below. Again, we must follow 
the rule that points labelled B, C or D must remain vertices, while there isn’t any 
restriction on points labelled A. 
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Figure 9 


To construct tables using other kinds of triangles, we need a different type of 
argument. 


Lemma 4.1. If x divides 90 and <A has sizex° and < B has size nx° where n is a 
positive integer, then the triangular pool table ABC does not have a pool shot from A 
coming back to A. 


A 


ff 


B C 


Figure 10 


Proof: We measure all angles mod 2x. 


Case I. n is even. 

Let 0 < @<-«x be the angle of a pool shot leaving A as in Figure 11(a), then 
inductively it bounces off sides AB and BC at angles +6 and side AC at angles 
x + 6 as shown in Figure 11(b)(c)(d). 

If it comes back to A then it must re-enter at the angle +6 mod 2x, but since 
0 < 6<x, —4@ is impossible. Hence, it must re-enter at the same angle 6 that it 
left. This can only happen if the pool shot hits one of the sides at 90°. But, 
then +6 = 90mod2x which implies that + @ = O0mod x (since x divides 90) or 
x + 6=90mod2x which again implies that +0 =0Omod x. This is impossible 
since 0 < 6<xX. 


Case IT. n is odd. 
Similar to the first case, a pool shot leaving A at an angle 0 < 6 <x hits side 
AB at angles +0, and sides BC and AC at angles x + 6 as shown in Figure 12. 
If it returns to A, then as before it must return at the same angle 6@ that it left. 
This is impossible for the same reason given above. a 
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(b) 


(c) (d) 


Figure 11 


Figure 12 
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Example 5. On any symmetric pool table of the type shown below where x divides 
90 and with angles B and C having size different from 180° (which guarantees that 
B and C remain vertices), there does not exist a pool shot from A to D. 


B 


Figure 13 
Proof: A pool shot from A to D would fold up to a pool shot from A to A in 
triangle ABC which is impossible. a 


This is an example of a quadrilateral pool table in which it is not possible to 
make a pool shot between two distinct boundary points. 


Example 6. By the lemma, there do not exist any pool shots from A to A on the 
triangle ABC shown below with m(< A) = 9° and n = 8. 


Figure 14 


By taking mirror images of this triangle and following the usual rule that B and 


C must remain vertices, we can construct a pool table without right angles which 
does not have a pool shot from A, to A). 


i A, 


Figure 15 
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This example can be extended to construct pool tables with any finite number of 
pool shots that cannot be made. 


5. GENERAL CONSTRUCTION THEOREM 


Theorem 5.1. Let G be a pool table built from a triangle ABC of the type shown in 
. Figure 10 and which is constructed using only successive mirror images of this triangle. 
If G is constructed following the rule that every occurrence of B or C is a vertex, then 
there does not exist a pool shot between any two points labelled A. 


Proof: The pool shot is impossible by Lemma 4.1, since a path between any two 
points labelled A corresponds to a pool shot from A to A in triangle ABC. = 


This is the general construction result used to form the various polygonal tables. 


6. THREE DIMENSIONAL EXAMPLES. In three space, reflection occurs only at 
points which have tangent planes, and rays bounce off the surface such that the 
angle between the incoming ray and the normal equals the angle between the 
outgoing ray and the normal. The incoming ray, the outgoing ray and the normal 
must be coplanar. Any ray which hits a vertex or an edge does not reflect. 

If P and Q are parallel planes, it is known that a parallel projection between P 
and Q will preserve angles and hence reflections. This is not so if the planes are 
not parallel. 

However, if a reflection occurs off a face whose normal 7 is either perpendicu- 
lar or parallel to a plane P, and Q is the plane formed by the two reflecting rays, 
then an orthogonal projection taking Q to P will preserve the reflection. (If n is 
perpendicular to P, the projected image is a straight line). 


Figure 16 
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This means that if we form a cylinder on any of the polygonal rooms R already 
constructed to form a polytopal room R X J, then a pool shot in the polytopal 
room would project orthogonally to a pool shot in R. 


Example 7. The following polytopal room is not illuminable from every point. In 
particular there does not exist a pool shot from any point on A, X J to any point 
on A, XI. 


Figure 17 


Proof: This would correspond to a pool shot from A, to A, in the room of Figure 
9, which is impossible. 


7. NON-CYLINDRICAL EXAMPLES 


Lemma 7.1. Given a cube with one corner labelled A, there do not exist any pool shots 
from A coming back to A. 


Proof: Let us take a lattice of mirror images of the cube with A at the origin and 
the vertices having integer coordinates. The A’s appear at even coordinates 
(2m,2n,2p) and every other vertex has at least one odd coordinate. As before a 
pool shot from A to A in the original cube corresponds to a straight line segment 
from A(0, 0,0) to A(2m,2n,2p) which must pass through a vertex other than A. 
It follows that the pool shot is impossible. = 


By virtually the same lattice argument, there does not exist a pool shot from A 
to any point on any edge attached to A. Alternately, we can use the projection 
argument with a given cube ABCDEFGH. If there were a reflecting path from A 
to X where X is on AH say, then using a suitable orthogonal projection, this path 
projects onto another path from A to A in the square ABCD which is impossible. 
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Figure 18 


Example 8. It is now easy to construct a polytopal room with two interior points 
which are not illuminable from each other. We need only take mirror images of the 
cube in Figure 18 following the rule that any edge not attached to say vertex A 
must remain an edge. The following example was constructed in this way. 


Figure 19 


Proof: The above rule guarantees that a pool shot leaving A and hitting another 
point labelled A must pass through the interior of one of the cubes surrounding it. 
By the comment to Lemma 7.1 and the unfolding argument, it could never hit the 
second A. = 


More generally, we can use cylindrical triangular building blocks by making use 
of the following lemma. 
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Lemma 7.2. Let T be a cylinder built on a triangle of the type shown in Figure 10, then 
there does not exist a pool shot between any two points on edge AD. 


Figure 20 


Proof: If we orthogonally project the cylinder onto triangle ABC, then a path in 
the cylinder between two points on AD corresponds to a path from A to A in 
triangle ABC which is impossible. | 


We immediately obtain the following result. 


Three Dimensional Construction Theorem 7.3. Let G be a polytopal room built 
from cylindrical triangles T of the type shown in Figure 20 and which is constructed 
using only successive mirror images of T. If G is constructed following the rule that 
every occurrence of an edge different from AD remains an edge, then 


(a) there does not exist a pool shot between any two points labelled A, 

(b) there does not exist a pool shot between .A and any D not immediately attached 
to A, 

(c) there does not exist a pool shot between A and any interior point of a segment 
labelled AD which is not attached to the original A, and 

(d) there does not exist a pool shot between any interior point X of AD and any 
interior point Y of a different segment labelled AD. 


Proof: The above rule guarantees that a pool shot leaving X and hitting Y must 
pass through the interior of one of the cylindrical triangles surrounding X. By 
Lemma 7.2 and the unfolding argument, it can never hit Y. Similar proofs can be 
given for the other statements. = 


By symmetry, the result also holds if we interchange A and D. 


8. A NON-POLYTOPAL EXAMPLE. We give a three dimensional example which 
is not polytopal and non-cylindrical but is a simple solid of revolution. 


Example 9. If we take any symmetric quadrilateral of the type shown in Figure 13 


and rotate it about the axis AD, then there does not exist a pool shot from vertex 
A to D, or A back to A or D back to D. 
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Figure 21 


Proof: Rays emitted from A (say) stay in the plane determined by AD and the ray 
and hence correspond to a pool shot in the planar table of Figure 21(a). a 


It is possible that such a device would have physical applications, in acoustics or 
thermodynamics. Rays generated at A or D never reach the opposite vertex and 
never come back on themselves. 


9. HISTORY. The illumination problem has been tentatively traced back to Ernst 
Straus in the early 1950’s. Two questions were posed. 


(1) Is a polygonal region illuminable from every point in the region? and 
(2) Is a polygonal region illuminable from at least one point in the region? 


Penrose and Penrose, 1958 [5], in an entertaining article constructed a smooth 
region based on properties of the ellipse which is not illuminable from various 
points. Other authors in written communications, then modified this example to 
construct a smooth region not illuminable from any point. Thus, both questions 
were answered negatively for smooth regions. Rauch, 1978 [6], gave an example of 
a smooth region not illuminable from any finite set of points. 

On the other hand, the solution for polygonal regions was not forthcoming and 
no significant progress appeared in the literature. The nature of these problems, 
being easily stated and easily understood together with their apparent intractability 
had an obvious appeal. Thus, they started appearing on various lists of unsolved 
problems. Klee’s paper, 1969 [1], seems to be the first published version. This was 
followed by a survey article of Klee and Guy, 1971 [7]. Klee again, 1979 [2], in an 
excellent exposition provided a list of the ten most appealing unsolved problems in 
plane geometry of which the illumination problem was his fifth. Recently, in 1991, 
two texts [3] and [4] of unsolved problems have been published both of which give 
excellent discussions of the two illumination problems. 

I think that Klee [2] best captured the spirit of these problems in his 1979 paper 
subtitled, “A collection of simply stated problems that deserve equally simple 
solutions”. 

He eloquently says, “In considering the problems of this paper, it is natural to 
wonder whether anyone has a reasonable chance of solving them. I can’t answer 
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that, except to say that problems of this sort are great equalizers among mathe- 
maticians, for solutions usually depend on clever ideas rather then extensive 
knowledge or development of complicated mathematical machinery.” 

The second problem is still open. 
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Neither you nor I nor anybody else knows what makes a mathemati- 
cian tick. It is not a question of cleverness. I know many mathemati- 
cians who are far abler than I am, but they have not been so lucky. 
An illustration may be given by considering two miners. One may be 
an expcrt geologist, but he docs not find the golden nuggcts that the 


ignorant miner does. 
—L. J. Mordell 


Mathematical Circles Adieu. Howard W. Eves, 
Boston: Prindle, Weber and Schmidt, 1977. 
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Three Sing-Sing Problems 


Gunnar Blom, Lars Holst and Dennis Sandell 


1. INTRODUCTION AND SUMMARY. In this paper, we consider three prob- 
lems, called by us the linear Sing-Sing problem, the cyclic Sing-Sing problem and 
the matching Sing-Sing problem. The problems are probably not new, but we have 
no references. However, the problems can be regarded as special matching 
problems in random graphs; see Barbour, Holst and Janson (1992) Section 4.4. 


Linear Sing-Sing problem. If the letters in the word SINGSING are permuted at 
random, we may obtain, for example, INGGINSS or SNINGISG. In the former 
case, the permutation contains two pairs of equal neighbours, GG and SS, but in 
the latter case there are no such neighbours. What is the probability that the 
permutation has no equal neighbours? As we will show, the answer is 12/35. 

Using numbers instead of letters, the answer can be obtained from the solution 
to the linear Sing-Sing problem: What is the probability distribution of the number 
of equal neighbours in a random permutation of the 2n numbers 1122... nn? 


Cyclic Sing-Sing problem. Consider the word SINGSING written in a circle: 


Figure 1 


What is now the probability that in a random permutation there are no equal 
neighbours? The answer is 31/105. 

More generally, in the cyclic Sing-Sing problem the numbers 1122...nn are 
written in a circle and we ask for the distribution of the number of equal 
neighbours after a random permutation. 


Matching Sing-Sing problem. We write SINGSING in two rows: 
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and permute the letters at random. Regarding the letters in each column as 
neighbours, what is the probability that there are no equal neighbours? The 
answer is 4/7. 

As before, we generalize the situation by writing the numbers 1122... nn in two 


rows: 
pep 2 et 
pep 2 | sp 
We now obtain the matching Sing-Sing problem: What is the distribution of the 
number of equal neighbours after a random permutation? 

Denote in each problem the number of equal neighbours by X,,. 

In Section 2 exact expressions for the distribution and moments of X,, are 
obtained. 

In Section 3 it is shown that, when n tends to infinity, the distribution of X,, 
converges to a Poisson distribution with mean 1, 1 and 1/2 for the linear, cyclic 
and matching problems, respectively. 

In Section 4 the accuracy of the Poisson approximations is studied by some 
numerical examples. Improved approximations can be obtained using the binomial 
distribution in the linear and the cyclic problems and the negative binomial 
distribution in the matching problem. 


2. EXACT RESULTS 


Theorem 1. In the linear, cyclic and matching problems, the mean E(X,,) is 


2n n 
1, 9 9 
2n-1 2n-1 


respectively. The variance Var(X,,) is 
1 1 2n [ 1 | n 1 
~ ———,, ——__|{1 - ——__], —_|1 + ——______ |. 
2n-1 2n-1 2n-1}) 2n-1 (2n — 1)(2n — 3) 


Note that the variance is somewhat less than the mean in the linear and the 
cyclic problems, but slightly larger than the mean in the matching problem. The 
proof of the theorem is given after the following theorem: 


Theorem 2. The probability function of X,, is given by 
n . 
r(x, =) = EB (-a*{F)s 
isk k 
fork =0,1,...,n, where S, = 1 and 
S.= loc ea 
J J} 2n(2n — 1) ++ (2n -j +1) 
in the linear problem, 


2n (yn 2/ 
i Qn -j\ i} 2n(2n - 1) + (2n -j +1) 
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in the cyclic problem and 


1 
n 

s.= |. )—ee—c—_——_—_—— 
J 4 (2n — 1)(2n — 3)+*+(2n — 2j +1) 
in the matching problem. 


Proof: We first remark that S, is the binomial moment defined by 


The formula for the probability function can be regarded as a generalization of the 
inclusion exclusion formula. It holds generally for any distribution on 0,1,..., 7; 
see Blom, Holst and Sandell (1994, Section 3.5); the reader may also consult Feller 
(1968, Chapter IV). We therefore only have to prove the three given relations for 
the binomial moments. 

Introduce in all three cases zero-one random variables J,,...,/,, where J, = 1 
if ¢ and i are neighbours, and J; = 0 otherwise. Then we have X, =J, + --: +1, 
and as the J,’s are exchangeable, we obtain 


e|(7") = [eC d) = ["\ PC, = =] =1), 


It remains to calculate PCJ, = --- = J; = 1) in each problem. 


(i) The matching problem 

Place the numbers at random in the 2 X n matrix by first putting one 1 in any 
entry, then the other 1 at random in any of the 2n — 1 remaining entries, then one 
2 in one of the 2n — 2 remaining entries, etc. The probability of getting the second 
1 in the same column as the first is 1/(2n — 1). Given that the two 1’s are in the 
same column, the conditional probability that the 2’s are in the same column is 
1/(2n — 3). Repeating this argument we get 


1 
P= == 1) = Gp aa ay 


which proves the assertion. 


(ii) The linear problem 

We permute 1122...nn in n —j +1 steps: First, the numbers 1122... jj are 
ordered at random. Second, the two (j + 1)’s are inserted at random. Third, the 
two (j + 2)’s are inserted at random, etc. 

We are interested in the event that when performing the first operation, the 
numbers 1 and 1, 2 and 2,..., and j and j become neighbours, and when 
performing the other operations, the pairs 11, 22,..., jj are never separated. 

The probability that in the first step 1 and 1, 2 and 2,..., and j and j become 
neighbours is 


ji2/ 2/ 
(2/)! 2f(27- A (+1) 
When in the second step we insert j + 1 twice, the conditional probability that the 
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pairs 11, 22,..., jj are not separated is 
4 
2j +2 


J 
1 — 
| 2j+1 
Similar conditional probabilities are obtained when j + 2 is inserted twice, etc. 
Multiplying a chain of conditional probabilities, we obtain 


2/ 2n J 
r= b=) aaa agaey AL, [al 


2 
2n(2n —1)-*(2n -—j +1) 


(iii) The cyclic problem 
Only a slight modification of the proof in (ii) is needed. 
Proof of Theorem 1: We seek E(X,,) and Var(X,,). Using the same notation as in 


Theorem 2 we find 
xX 
e(x,) ~£|(4")] = 


and 


+ E(X,) — [E(X,)]° = 28, + S; — S?. 


Var(X,) = 2|(* 


We obtain from Theorem 2 


s=("\ a4 s,=(") = 

1}/2n 2 \2)2n(2n-1)) 2n-1 

in the linear problem, 

_ 2n (") 2 2n ¢ 2n (”) 2? _ on 
2n—-1\1 2}/2n(2n-1) 2n-1 


In 2n-1’ 2 On -2 


S 
in the cyclic problem, and 


s(t) at 
1 \dJn-10 Qn-1?’ 


tn 1 7 n(n — 1) 
° (3) GahaaTy ~ We Gey 


in the matching problem. Introducing these three pairs of S’s in the expressions for 
E(X,) and Var(X,,) we obtain the results in the theorem. 


Remark. The original linear Sing-Sing problem is obtained by taking n = 4 in the 
general linear problem in Theorem 2. This gives S, = 1, 8, = 3/7, S; = 2/21 and 
S, = 1/105, from which we find P(X, = 0) = 12/35, as already stated at the 
beginning of the paper. The corresponding probability in the original cyclic 
problem is 31/105 and in the original matching problem 4/7; see Table 1. 
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TABLE 1. Exact probability distribution P(X, = k), n = 2,3, 4, 
for the linear, cyclic, and matching case. 


4 
1 


Linear case 
Cyclic case 
Matching case 


[rs 


= 
> 
wr 


|- 


wpe wn wl | 
_ 
> 
wa 


= 
> 
wr 


3. POISSON CONVERGENCE. Let Po(m) denote a Poisson distribution with 
mean m. 


Theorem 3. In the linear, cyclic and matching Sing-Sing problems, the number of 
equal neighbours X,, converges in distribution to Po(1), Po(1) and Po(1/2), respec- 
tively, as n tends to infinity. 


Proof: From the expressions of the binomial moments of X, in Theorem 1 it 
follows that these moments converge to 1/j!, 1/j! and (1/2)//j! as n tends to 
infinity. As these are the binomial moments of the limiting Poisson distributions, 
the assertion follows. 

As a measure of the difference between the probability distributions of two 
random variables X and Y the variation distance 


d(X,Y) = sup|P(X € A) —- P(YeEA)| 
A 


is sometimes used. For non-negative integer random variables it'can be written in 
the form 


d(X,Y) = Dex k) — P(Y = 4]. 


No] Re 


A small value of dX, Y) indicates that it might be reasonable to approximate the 
distribution of X with that of Y. In the recent monograph Barbour, Holst and 
Janson (1992) general results on Poisson approximation are obtained and a variety 
of applications are studied. As a corollary of results in Section 4.4 in that book one 
can deduce: 


Theorem 4, Upper bounds of the variation distances between X,, and Poisson distribu- 
tions with the same means as X,, in the linear, cyclic and matching problems are, 
respectively, 


[1 — exp(—1)] 5. 


E - ew |] 


1 - (5 yey 
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Note that in each case the variation distance goes to zero as n goes to infinity, 
giving an alternative proof of Theorem 3. Also note that Theorem 4 is stronger 
than Theorem 3 as it also gives rates for the Poisson convergence and provides 
bounds on the maximal errors of the Poisson approximations. 


4, APPROXIMATIONS. We shall now consider other approximations, binomial 
distributions in the linear and the cyclic problems and a negative binomial 
distribution in the matching problem. In all three cases we use distributions with 
the same mean and variance as X,; see Theorem 1. For the linear Sing-Sing 
problem, the distribution of X, is approximated by a binomial distribution with 
parameters 2n — 1 and 1/(2n — 1). In the cyclic Sing-Sing problem, the approxi- 
mating binomial distribution has parameters 2n and 1/(2n — 1). The approximat- 
ing distribution of X, in the matching Sing-Sing problem is negative binomial with 
parameters N = n(2n — 3)and P = 1/[(2n — 1)Qn — 3)]; here we use the follow- 
ing form for the probability distribution of the negative binomial: 


p(k) = [GN \(-pyta + py" 


for k = 0,1,.... Note that all three distributions are for large n close to Poisson 
distributions. 

In Table 2 we compare, for each of the three original Sing-Sing problems 
(n = 4), the exact probability distribution with that obtained by the Poisson 


TABLE 2. Probability distribution P(X, = k) in the second column; 
approximations in the third and fourth columns. 


[ef Bet] Peison [Almas 
ear cass 
a 
. 
a 


ties 
0.0762 0.0755 

es 
of snis[oseer [0558 
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TABLE 3, Linear case: variation distance between true distribution 
and two approximating distributions. 


es 
0.038657 0.007063 
0.030191 0.003967 


a 
a 


TABLE 4. Matching case: variation distance between true distribution 
and two approximating distributions. 


Po Variation distance 
ion 


4 0.035812 0.030735 


ee 
es 
a 


0.000279 0.000014 
0.000153 0.000006 
0.000097 0.000003 


approximation and that obtained by the alternative approximation. As seen from 
the table the alternative approximation is very good. In Table 3 we reproduce in 
the linear case the variation distance between X,, and each of the two approximat- 
ing distributions. Evidently, the binomial distribution provides a much better 
approximation than the Poisson distribution. The variation distance for the match- 
ing case is given in Table 4; clearly, the negative binomial approximation is 
superior to the Poisson approximation. However, to our knowledge there is no 
general theory available providing bounds on the error of such binomial or 
negative binomial approximations. 
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(from the collection of Paul Halmos) 


How are they related? 
(See page 892) 
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A Nobel Prize in Mathematics 


John E. Morrill 


On October 11,.1994, the Royal Swedish Academy of Science announced that John 
F, Nash was among those selected to receive a 1994 Nobel Prize. Keith Devlin, in 
FOCUS, the newsletter of the Mathematical Association of America, later wrote 
that this “meant for the first time in the 93-year history of the Nobel Prizes, the 
prize was awarded for work in pure mathematics” ([4], p. 1). But Nash had not 
been chosen for the award, “‘often referred to as the ‘Nobel Prize of Mathematics’ ”’ 
({13], p. 168), the Fields Medal, or more precisely the International Medal for 
Outstanding Discoveries in Mathematics. Rather, he really had been selected to 
receive a Nobel—the Nobel Memorial Prize in Economic Science. 

Upon reading this announcement, I was reminded of the many “‘answers” I had 
heard or read to the often asked question concerning “...the reason for there not 
being a Nobel Prize in Mathematics” (e.g., [7]). I wondered if this was really still aa 
unsolved problem. If not, then the solution to this non-existence question is not 
well known, for we find in [4] the repetition of the speculation “...that a 
particularly bad experience in mathematics in high school led to the exclusion... .” 
More often however one hears variations on “The widely circulated explanation of 
why there are no Nobel Prizes in Mathematics is that Alfred Nobel wanted to 
make sure that the mathematician Mittag-Leffler would never be awarded a Nobel 
Prize” ({1], p. 39). 

In my own mind the matter of the “Nobel in Mathematics” more or less had 
been settled by two responses to the query in that Monthly Letter [7]. The first, 
also appearing as a Letter to the Editor in this Monthly, [2], includes 


I can offer some evidence relating to Professor King’s query about the 
reason for the absence of a Nobel Prize in mathematics (August-September, 
1983). While I was an undergraduate at Northwestern in the early 60’s I 
heard the rumor that Nobel’s refusal to endow a prize in mathematics was 
due to a grudge against Mittag-Leffler. As I heard it then, the two had 
quarreled over a woman. Later I read that Mittag-Leffler had accumulated a 
fortune, and somehow annoyed Nobel in the process. 

During a three-month visit to the Institute Mittag-Leffler in 1981 I had the 
opportunity to talk with archivist Barbara Bjornberg, who knows the personal 
lives of the people around Mittag-Leffler extremely well. She had already 
heard the rumor and did not believe there was any truth to it. She pointed 
out to me that Nobel never married and that “Mittag-Leffler’s fortune” was 
actually his wife’s dowry. I could not find any evidence to corroborate the 
rumor. Since the absence of evidence is not the same as evidence of absence, 
the question really must be investigated from the other end. Who first made 
the allegations, and on what evidence? 

Such evidence as I do have suggests that there is nothing surprising in the 
absence of a Nobel Prize in mathematics. The question itself seems to 
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conceal a questionable assumption. Why should there be a Nobel Prize in 
mathematics? Nobel may not have considered mathematics important. Or, 
being a neighbor of Mittag-Leffler, he may have thought that the many prizes 
and honors Mittag-Leffler had obtained from King Oscar IT for mathemati- 
cians were already sufficient recognition. These honors were quite extensive; 
they went to Poincaré, Appell, Bertrand, Hermite, Weierstrass, and 
Kovalevskaya, among others. 


The second respondents, ((5], p. 73), conclude much the same. 


There are Nobel prizes in physics and chemistry, so why not in mathemat- 
ics? There are two current answers. 

1. (French-American version) Mittag-Leffler had an affair with Nobel’s 
wife. 

2. (Swedish version) Mittag-Leffler was the leading Swedish mathemati- 
cian at the time when Nobel wrote his will. Nobel knew that if there was to 
be a prize in mathematics, Mittag-Leffler could use his influence in the Royal 
Swedish Academy of Science to become the first recipient. To avoid this, 
Nobel gave no prize in mathematics. 

Although Nobel was a confirmed bachelor, the French-American version 
leads a healthy life as one of the myths of mathematics and as a recurrent 
subject of conversation of mathematicians who think it is unfair that physics 
has a prize but not mathematics. The Swedish version is an academic 
fabrication with no credibility. In fact, Nobel and Mittag-Leffler had almost 
no relation to each other. The true answer to the question is that, for natural 
reasons, the thought of a prize in mathematics never entered Nobel’s mind. 


Similarly, Delvin comments “...it may simply be that Nobel felt that mathemat- 
ics was not, in itself, of sufficient relevance to human development to warrant its 
own award” ((4], p. 1). 

In looking at this story another question came to mind—has the Fields Medal 
ever been awarded to a (mathematical) economist? In examining this question I 
was led to the elegant illustrated history, International Mathematical Congresses, {1] 
and found there a citation to a work on the history of the Fields Medals. And, in 
this latter work, [13], I found three very interesting quotations: 


“.. he [Fields] spent a decade in Europe continuing his studies. ‘This 
long period of study,...exercised a decisive influence on his life and 
outlook,... Of the connections which he established, perhaps the most 
important was an enduring friendship with Mittag-Leffler.” (p. 168) 


“It was from Fields that I heard of the difficulty between Nobel and 
Mittag-Leffler. I gather that it was a matter of personal jealousy, ...” (p. 168) 


‘Perhaps I should insert here something that Fields told me and which I 
later verified in Sweden, namely, that Nobel hated the mathematician 
Mittag-Leffler and decided that mathematics would not be one of the 
domains in which Nobel Prizes would be available.” (p. 171) 


There doesn’t seem to be any doubt that these last excerpts lend support to 
some sort of “Mittag-Leffler theory”. The plot thickens. The source of all three 
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quotations just above is John Lighton Synge, the executor of Fields’ will and a 
person directly involved in the establishment of the Fields Medals. The “main 
source” of the second respondents above—who call the Mittag-Leffler theory “an 
academic fabrication with no credibility’—is a book by Ragnar Sohlman, the chief 
executor of Nobel’s will and later director of the Nobel foundation. So, if we 
restrict our “referees” on this matter to executors, “Why no Nobel Prize in 
Mathematics?” may still have the status of an unsolved problem. 

On the question concerning the Fields Medal to an economist—the answer 
would seem to be known. No Fields Medal has gone to an economist. However, it 
is interesting to note that Stephen Smale, a 1966 Fields Medalist, later did do 
“work in economic theory” and also “joined the economics department” ((11], p. 
61). It is interesting to observe the 1983 Nobel Laureate in Economic Science, 
Gerard Debreu, was also a Plenary Lecturer at the 1974 International Congress of 
Mathematicians where he spoke on “Four Aspects of the Mathematical Theory of 
Economic Equilibrium.” Smale’s comments upon the announcement of Debreu’s 
Nobel, [11], include 


Debreu’s great contribution is his profound use of mathematics in the 
central theme of economic theory, consolidating an insight of Adam Smith 
more than 200 years ago. Debreu has given the foundations of general 
equilibrium theory in his classic work “Theory of Value.” The award of the 
Nobel prize to Debreu gives a valuable impetus of basic research in mathe- 
matical economics. 


A skimming of just the index of Theory of Value, [3], should convince a 
mathematician of the deep mathematical ideas used by some economists in their 
work. Of course, the mathematicalization of economics is well-known to most (and 
to all economics graduate students today) as is the fact that many recipients of this 
Nobel Prize, such as Debreu, Kenneth Arrow and Leonid Kantorovich, are also 
known as mathematicians. (For the interested reader descriptions of the work of 
Nobel Laureates in Economic Science can be found in [6] and in recent issues of 
The Journal of Economic Perspectives.) 

As far as I can tell, even though Fields’s endowment did not require it, no 
recipient of the Fields Medal has been over forty years of age. And, although not 
specified by the Central National Bank of Sweden who made possible the fund for 
the creation of the Nobel Prize in Economic Science in 1968, no Nobel Laureate in 
Economics has been under forty years of age. This might suggest, following from 
the New York Times headline, “Game Theory Captures a Nobel,” [8], an optimal 
strategy for young, prize-seecking mathematicians: If you are nearing forty, and 
there are no prospects for a Fields shift disciplines and go for the Nobel. Since it 
has been claimed, ({10], p. 222), 


... simple models...in economics can exhibit dynamical behavior far 
more complex than anything found in classical physics or biology. In fact, all 
kinds of complicated dynamics (e.g., involving topological entropy, strange 
attractors, and even conditions yet to be found) already arise in elementary 
models that only describe how people exchange goods (a pure exchange 
model). 

Instead of being an anomaly, the mathematical source of this complexity is 
so common to the social sciences that I suspect it highlights a general 
problem plaguing these areas. If true, this assertion explains why it is difficult 
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to achieve progress in the social sciences while underscoring the need for 
new mathematical tools. 


Maybe one could win a Fields Medal AND a Nobel Prize, if they found the 
right “new mathematical tools.” 

By the way, I heard a new rumor about this Nobel Prize in Mathematics thing. 
Fields was in Europe for a decade (1892-1902) primarily in Paris and Berlin 
({12], p. 800). We already know that Fields made “an enduring friendship with 
Mittag-Leffler”, a man who “...was very conscious of the importance of maintain- 
ing a record.of the contemporary history of mathematics for posterity...” ({9], p. 
1725). Further, “In the mid-seventies Nobel settled in Paris”, but, “During the 
final years (1893-1896) of his life he did spend some time in Sweden... .” ((5], p. 
74). 

Here’s the rumor I heard: Sometime in 1895 there was a meeting, held at 
Nobel’s estate Bjorkborn, attended by only three men, Fields, Mittlag-Leffler and 
Nobel. There was a dispute between Fields and Nobel that apparently had begun 
in Paris a few years earlier, and Mittag-Leffler had been asked to arbitrate a 
solution. When the meeting was over and things had been decided, Nobel had 
gotten Literature, Physiology or Medicine, Physics, Chemistry and Peace; Fields 
had gotten Mathematics. As they were leaving, Fields was heard to have whispered 
to Mittag-Leffler “we sure put one over on him, didn’t we?” 
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FOR WHOM NOBEL TOLLS 


It is a fact that Nobel Prizes 

Come in many shapes and sizes. 

But one is missing from the list: 

The Nobel Math Prize does not exist. 


There is a widely held suspicion 
Explaining this bizarre omission. 
It says that jealousy is at the crux 
Of why we get no Nobel bucks. 


For Alfred Nobel became aware 

Of his fiancée’s prior love affair 

With a mathematician, who held her tight 
And thought that she was DYNAMITE. 


Then Nobel, reacting as expected, 

Vowed, “Mathematicians shall be neglected! 
“And if it’s Sweden they want to sce, 

“Let them take a tour and pay the fee!” 


Now, Nobel’s behavior may be a disgrace, 

Yet wouldn’t the world be an even worse place 
Had his lover’s purported tryst 

Been, instead, with a... pacifist? 


—William Dunham 
Department of Mathematics 
Muhlenberg College 

2400 Chew Street 
Allentown, PA 18105 


Answer to Picture Puzzle 
(p. 887) 


The relation is that of equality: 
they are both Lars Hormander. 


892 A NOBEL PRIZE IN MATHEMATICS [December 


Some Exact Number Theory 
Computations via Probability 
Mechanisms 


Richard Blecksmith and Purushottam W. Laud 


1. INTRODUCTION. In this paper we apply stochastic methods to efficiently 
compute several number theoretic functions. The application of probability to 
number theory suggests density results. For example, the prime number theorem 
asserts that the probability that a number chosen between 1 and n is prime is 
approximately 1/log n. This is, of course, just an asymptotic estimate. Our goal 
here is to obtain exact results about functions involving the bit patterns of 
numbers. The results we describe can be generalized to other bases, notably base 
10, but we work with base 2 for simplicity and ease of computations. 

The first problem we address is how to select a random number x between 0 
and a fixed bound n, written in binary as 


nN = (€,€) *** &)>. (1) 


We assume €, = 1, so that k =k, is the number of bits in the binary expansion of 
n. We wish to select x by choosing its bits at random, from left to right. The 
problem is that we are not initially free to choose the beginning, left-most bits of x 
as we please. As long as the bits of the number we are forming agree with the bits 
of our bound n, then we are forced to choose 0 for the next bit of x if 0 is the 
corresponding bit of n, otherwise x will exceed n. Once we break the initial bit 
pattern of n, we may choose the next bit of x to be 0 or 1, each with probability 5. 
The bit selection chain (or BSC) we describe is a Markov chain on the following 
discrete state space: 


1: bits of x to date do not match those of n and the current bit of x is 0 
2: bits of x to date do not match those of n and the current bit of x is 1 
3: bits of x to date match those of n and the current bit of x is 0 


4: bits of x to date match those of n and the current bit of x is 1. 
In order to determine the transition matrix of BSC, we need the function 
Qk-i 
= —_______., 2 
1+ Dt_,6,2*7 (2) 


July) 


Pi 


If the ith bit of n is 1, Le. €, = 1, then p, is the probability of the ith bit of x being 
Q given that the first 7 - 1 bits of x match those of n. Observe that in the 
denominator of p,, the sum Lj_,¢2*7 = (€,,, °** €,)2, the number formed by 
truncating the first i — 1 leading bits of n. The transition (probability) matrix for 
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the ith transition is given by 


5 0 0 
5; 0 0 


pe 89 1--~ (1—pe | 
peg 8 1-<~ (1 — p;)é; 


The four dimensional vector 7 is the initial distribution on the state space. After 
each transition, we have a distribution of the state space 


We denote the mth element of 7 by 7? 


2. BLOCKS OF NUMBERS. We assume the binary expansion of n is given by (1). 
We define b,, the number of blocks of n, to be one more than the number of i, 
1<i<k —1, such that ¢,#€,,, for n positive, b) = 0. This definition for an 
arbitrary base b appears in [1] where it is shown that the number of blocks of a” 
written base b goes to infinity with n, as long as log a/log b is irrational. Our goal 
here is to compute the sums L?_)b,. 

We view each of the n + 1 numbers between 0 and 7, inclusive, as consisting of 
k bits, allowing leading bits to be 0. 5s have (n + 1)k opportunities for a switch 
and the number of switches is dj- .95;. We construct a probability mechanism that 
selects one of these opportunities at random (with equal probability). Then 


Pr(switch) = 


The mechanism is to first select a random number by employing the bit selection 
chain (BSC). Then independently select a “spot” from 1,..., k. Thus, 
k 


Pr (switch) du Pr(ith spot is selected and we have a switch there) 


Pr(ith spot is selected) Pr(switch | ith spot is selected) 


I 
co aoe st 3 1 


Pr(switch at the ith spot). 


We choose the initial distribution to be 
m=(0 0 1 0). 

Summing over the four states 1,..., 4, 

Pr(switch at the ith spot) 

4 
>» Pr(BSC is in state m at (i — 1)st spot) 
m=1 
x Pr(ith transition of BSC entails a switch | BSC 
is m at the (i — 1)st spot) 


4 
yy 7-) x Sum of the elements of the mth row 


I 


of P, that result in a switch. 
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Applying this to the transition matrix P., we find 
Pr(switch at ith spot) = w{'~)P.(1,2) + w$"PP(2,1) + wf YP(3, 4) 
+ mf" (P(4, 1) + P,(4,3)). (3) 


Example 1. For n = 12 = (1100),, compute L}j29b,. 
The four transition matrices are 


- ¢ 0 0 > + 0 0 
1 ‘+ 9g QO t+ 1 9 OQ 
P,=|, ° ; P,=|, ° , | and 
az 0 0 F z= 0 0 = 
S00 4 S00 4 
3 4 0 0 
I 1! 9 Q 
P,=P,={> * | 
5 "F100 1 :=«0 
0 0 1 O 
Hence the probability distribution vectors 7,..., 7® (we don’t need 7) are: 
(0,0, 1,0), (4, 0,0, 3), 4, 4,0, 4), and (4, 4, 4,0). A routine calculation gives 
Pr(switch lS p itch at spot i °,t,4 
r(switeh) = 7 2 r(switch at spo =F ; Batptpl->z 
Thus, 
12 1 
5; = (n + 1)kPr(switch) = 13-4- 7 = 26, 
j=0 


which can be verified directly. 


Example 2. Compute L7_,b, for n = 2*~'. 


Since p, = 2*-'/(n + 1) and 1 — p, = 1/(n + 1), the first transition matrix is 


1 1 

5 5s 0 (0) 

1 1 

5 = 0 (0) 

Pi = Qk-l 0 0) 1 e 

n+1 n+1 
2k! 1 
n+] 0 0 n+1 


Since all the other bits in the binary expansion of 2*~! are 0’s, for i > 2 each 
transition matrix 


COO fl Ne 
COO fle NIH 
Se CG © 
oo GO O&O 
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Observe that P is an idempotent, so each power of P is just P. Thus, for i > 2, we 
have 


1 1 

3 z 0 

1 1 

5 5 (0) (0) 
P,P, P, =P,P= Qk-2 Qk-2 1 0 

n+1 nt+1 n+1 

Qk-2 Qk-2 1 

n+1 nt+1 n+1 0 


The distribution vectors are the third rows of these products: 


k-1 1 
mO=(0 0 1 0), rM= 


0 O 
n+1 n+1 


Dk-2 Qk-2 1 


(i) _ 
qm) = 
n+1lont+1 aArctl1 


| fori > 2. 


Applying (3) and simplifying, we find 


Pr(switch at spot 1) = ———, Pr(switch at spot 2) = eae 
n+1 n+1 
Qk-2 
Pr(switch at spot i) = 741? 
for i > 2. Thus, for n = 2*7}, 
h k-2 4 k-2 
Lee Dy ta Da] mE DR 2 


3. SUM OF THE RATIOS OF BLOCKS TO BITS. We denote the ratio of blocks 
to bits by 


oa 


§=— 
] b 
kj 
where k, is the number of bits in the binary representation of j. Given a fixed 
integer n whose binary representation is (1), our goal is to compute the sum 


where the indicator variable S;; is 1 if there is a switch at the ith bit of j, zero 
otherwise. To account for the denominator k; in 6,, we develop a probability 
mechanism that modifies BSC to generate integers between 1 and n with the 
following three conditions: (1) selecting an integer j is independent from selecting 
a bit i; (2) each bit is selected with equal probability 1/k; and (3) each integer j is 
selected with probability proportional to 1/k,. We call the constant of proportion- 
ality in the third condition 1/c. To find c, note that since we are generating 
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numbers j between 1 and n, 
1 


1 
ck; 


Me 


n 
1= )) Pr(number selected is j) = 
j=l 


J 


so that 


n 4 k 
cer 
j=l “J i= 


1 
———— ° ber of (kK — i + 1) bit int <n). 
Ena (number of (k — i + 1) bit integers < n) 


Since 1 + L*_, €,2*-' enumerates the k-bit numbers between 1 and n, while 2*~' 
enumerates the (k —i+ 1)-bit numbers <n, we can effectively compute the 
normalizing constant c by the formula 


1+0h,62%' &k Qk! 


co ay 


ae 4 
k Sk-i+l 4) 


Since S is a Bernoulli random variable, the expected value of S;, is precisely the 
probability that S;,; = 1. It follows from our three conditions that 


kK on 
Pr(switch) = E(S;;) = » 5;; Pr(number selected is j and bit selected is 7) 

i=1 j=1 
kK on 

= )' )'S,;: Pr(number selected is j) Pr(bit selected is i) 
i=1 j=1 
Kn 4 1 on 

= —is =—Y6. 
» Xu C k i] kc L J 


Thus 


nA 
») 5; = ke Pr(switch). 
j=l 


Now to generate a random integer j with probability proportional to 1/k,, we 
adjust the state space of the BSC Markov chain as follows: 


1: 

; as in BSC with the additional constraint that not all bits to date are 0 
4: 

5: all bits to date are 0. 


To compute the transition matrix for this expanded state space, note that the 
probability of each transition in the subchain of states 1-4 is precisely that for 
BSC. It is clearly impossible to move from a state 1-4 to state 5. The task remains 
of finding the probability of leaving state 5 at any particular stage i in the Markov 
chain. Let s; denote the conditional probability that k; =k-—i+1 given 
k; <k —i +1, that is, the probability that the ith bit is 1 given all previous bits 
are 0. To make sure that the marginal probability is 1/c(k — i + 1) for the event 
“a specific j is generated having k — i + 1 bits”, we must have for i = 2,..., k, 


k-i 


i-1 
Pr(bits 1,...,i-1=0, biti = 1) = 1 —s,,)s; = ———— 
r (bits it i = 1) IT ¢ Sm) 8 c(k=i+ 1) 
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since there are 2*~' such numbers. This gives the recursive formula 


Qk-i 
 C(k-it+ DIL (1 —s,) (5a) 


To obtain s,, notice that the given event “all previous bits are 0” has probability 1 
and recall that the probability a specific k bit number is generated should equal 
1/ck. Since there are 1 + D¥_,,2*~! such numbers between 1 and n, we have 


1+ D%,62*-! 
ck 


5, = 


(5b) 


We can now determine the transition matrix P, for this Markov chain* 


5 OS 0 0 0 
5 OS 0 0 0 
P, = Pi€i 0 1 ~ €; (1 Pi) & 0 
Pi€i 0 1 — €; (1 — p;) &; 0 
OS; 0 0 i-s, 
The distribution vector is 
aD = @J-UP,  -wM=(0 0 0 s, 1-5;). 


By our probability mechanism, 
Pr( switch at first spot) = s, 


and for 1 > 2, 


Pr(switch at ith spot) = af" ?P(1,2) + a" )P.(2,1) + wf" P,(3, 4) (6) 
+ afi (P.(4,1) + P.(4,3)) + af" PP(5, 2). 


This is just (3) with the extra term to account for the fifth state. Finally, by the 
requirements that each spot is chosen with equal probability and independently of 
the number selected, the argument of the previous section applies to give 


1 


k 
a dL Pr(switch at ith spot). 


Pr (switch) 


Example 3. For n = 12 = (1100),, compute Lj2, 6). 
Formulas (4) and (5) give 


55 3 


*For notational simplicity, we re-use the letters P and a for the transition matrix and distribution 
vector, and do not worry about distinguishing them from their counterparts in the BSC chain. The 
appropriate usage is always clear from context. 
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The three transition matrices are 


+ 53 0 0 0 > 4 0 0 0 
-; 3 0 0 0 >; + 0 0 0 
P,=|5 00 4% 0 P,=| 3 O O § O| and 
* 00 7% #0 - 0 0 ¢ 0 
0 20 0 3 0 40 0 5 
1 1 9 0 0 
[3 4 0 0 0 
Pa=!l9 0 1 0 0 
0 01 0 0 
0 10 0 0 
Hence the probability distribution vectors 7"”,..., 7® are: 
1002.2 12 16) 3 a (# 6 502) 
(0,0,0, =. <7]. Ss 55’ 55° 55°’ 55] 


A routine calculation using (6) gives 
h 1 3 P h 1 | 3 28 29 =| 26 

e — — e e — — __ + __ + __ + __ —_—- — 

Pr( switch) " 2 r(switch at spot /) Hants tasts 3 


Thus, 


12 5 
)) 6 = ke Pr(switch) = 4- + = > 
j=l 


which can be verified by a direct calculation. 


4, RUDIN-SHAPIRO SEQUENCE. The Rudin-Shapiro sequence {a,} is defined 
by 


a, = (— 1) Ere 


where the binary representation of n is (1). That is, the sign of a, depends on the 
parity of the number of pairs of consecutive 1’s in the binary expansion of n. The 
sequence {a,} relates to a variety of interesting applications from computing the 
orientation of creases in paper folding to spin theory related to Ising models in 
high energy physics. (See [4].) In this section we wish to evaluate the sum 


of the Rudin-Shapiro coefficients. Brillhart and Morton give formulas in [2] for 
calculating s, based on doubling and shifting the index. In a subsequent paper 
with Erdés, they examine further analytical properties of s,. We approach the 
problem of computing s, by viewing the selection of a number j between 0 and n 
with Rudin-Shapiro coefficient a; = 1 as a probability, which can be calculated by 
a Markov chain. As we form the bits of x by BSC, we keep track of the parity of 
the number of consecutive 1’s in the bits of x to date. We include a parity switch in 
our BSC state space. This switch is 0 or 1 depending on whether the number of 
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pairs of successive 1’s to date is even or odd. The state space for the Rudin-Shapiro 
Chain is 


1: bits of x to date do not match those of n, the current bit is 0, and the current 
parity switch is 0 

2: bits of x to date do not match those of n, the current bit is 1, and the current 
parity switch is 0 

3: bits of x to date match those of n, the current bit is 0, and the current parity 
switch is 0 

4: bits of x to date match those of n, the current bit is 1, and the current parity 
switch is 0 

5: bits of x to date do not match those of n, the current bit is 0, and the current 
parity switch is 1 

6: bits of x to date do not match those of n, the current bit is 1, and the current 
parity switch is 1 

7: bits of x to date match those of n, the current bit is 0, and the current parity 
switch is 1 

8: bits of x to date match those of n, the current bit is 1, and the current parity 
switch is 1. 


A transition from states 1—4 to states 5-8, and vice-versa, occurs only if the current 
bit of both states is 1, so that the parity switch changes. Hence the transition 
matrix for the Rudin Shapiro sum is 


5 3 0 0 0 0 0 0 
1 0 0 0 0 4+ 9g 0 
pe 0 1-6 (1-p)e 0 0 0 0 
pe 0 1-~e 0 0 O 0 (1 — p,)é; 
=) 4 9 9 0 a ar 0 
0 5 0 0 >; 0 0 0 
~ 0 0 0 0 peg 09 1-e O-pie 
0 0 0 (l-p,eé pe 9 1-«e 0 


Observe that P, has a block pattern that strongly resembles the BSC transition 
matrix P/?°©. If e,;, denotes the 4 x 4 matrix consisting of a 1 in position (i, j), 


with all other entries 0, then P, = “ y ; 


A, = P{8°© — B.. The sum of the Rudin-Shapiro coefficients s, is given by 


where B; = 3€,, + (1 — p,Je,e4, and 


Sn 


n+1 


= TP, P, + Pio, (7) 
where 

rm =[0 0 1 0 0 0 0 Oj and 

o=[1 11 1 -1 -1 -1 -1)'. 


The premultiplying vector 7“ identifies 3 as the initial state. The postmultiplying 
vector o gives the correct sign to the final distribution, i.e. the portion ending in 
states 1-4 represent those j’s with a; = 1, those ending in states 5-8 represent 
those j’s with a; = —1. 
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Example 4. Compute s, = Lj_9a,; for n = 2*~'. 


For stages 2,..., k, each transition matrix has the form P, = P = E A where 
i 41 
: 5 0 00 | 
A=|? and B=|0 3 
1 O 
0 0 
0 1 O 


Since the transition matrices P, are constantly P for i = 2, (7) becomes 


Sn 


n+1 


= TOP, Pre, 


We wish to evaluate P*~'. Although P is not an idempotent, as in Example 2, the 


powers of P have a fairly simple form: P*~! = a a , where the blocks 
A,_, and B,_, depend on the parity of k. Define 
gali ifkisodd ., ,_*k*4 
2 if k iseven 2 
If 
1 1 4 1 1 
=-—-+— — 
en Gt ye ane Baa Fe 
then 
a a 
a B 0 BoB 4 
A,_1,= 0 and B,.,=|B a if k is even, 
0 1 0 0 0 
1 
a + 
1 0 Boa, 
A,,=|4% °% and B,.,=|1 B if k is odd. 
1 0 
0 1 0 0 0 


This formula can be proved by straightforward induction on k. The blocks for 


pk = kK ‘| are given by 


A, = AA,., + BB,_, and B, =AB,_, + BA,.,. 
By examining the cases (i) k is even and (ii) k is odd, it is easy to see that A, and 
B, satisfy the above formula for the appropriate values of e, a, and B. 
The product 7P, is just the third row of P,. Since the probability used in 


computing the first transition matrix P, is p, = 2*"'/n+))=1-1/(n+D, 
we have 


k-1 


1 
—- 090 0 0 OI. 
n+1 


7) = 7Op = 
n+1 


By our formula for P*~', the product P*~'o is readily computed by subtracting 
the sum of columns 5-8 of P*~' from the sum of columns 1-4, Combining both 
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cases where k is even or odd, we find 

Pi'g=[d(a-B) d(a-B) 1 1 d(B-a) d(B-a) -1 —I)’. 

Thus, s,, is (m + 1) times the product of these two vectors: 
s,=d(a-—B):2*'4+1=d2** +1, 


since a — B = 2-1/2°. Substituting for d and e gives 


5 = Qk/7 44 if k is even 
” 2%-D/2 4.4 if k is odd. 


5. COMPUTER IMPLEMENTATION AND CONJECTURES, We have written 
computer programs to implement the three methods discussed in this paper. The 
algorithms are extremely efficient. Consider, for example, the problem of comput- 
ing the Rudin-Shapiro sum s,. The 8 X 8 transition matrix P, has either 14 or 16 
nonzero entries, according as e, = 0 or 1. Thus, formula (7) requires at most 
16k + 8 multiplies, where k is the number of bits of n. For the block sum and 
Rudin-Shapiro problem, the final answer is an integer. For the 6 sum problem, 
however, the final answer is a rational, stored in the computer as a floating point 
number x. But note that the denominators in the sum are all < k. We obtain the 
fractional representation a/b by multiplying x by the least common multiple of 
the integers 1,...,k. 


We present a short table of the three functions for n = 10”, m = 1,..., 10: 
n n 
m v5, 6 S, 
j=l j=l 
1 21 7 ° 7 
12 
2 352 61 ot 13 
210 
3 5040 562 oa) 37 
630 
4 66918 5420 181 
12870 
5 848344 54041.07 473 
6. 10048328 530201.49 1111 
7. 116578252 5223784.09 5777 
8 1350523914 52609339.74 16367 
9 15070502158 520912798.65 42209 
10 166108468238 5146376588.29 189985 


The data for s, are consistent with the result of Brillhart and Morton [2] that 
s,/Vn varies between ¥.6 and V6 as n > ~. The data in the second column 
agree with the expression 57 log, n to 3 or 4 decimal places as we read down the 
column. Such an asymptotic estimate is suggested by Example 2 where it is shown 
that UP_ 9b; = sn log, n + 2 when n is a power of 2. Looking at the third column, 
it appears that 1/nL}_, 6; is slowly approaching 5. We generalize this to arbitrary 
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bases in the following 


Conjecture. Given a fixed base b. Let B(j) denote the number of blocks of j in its base 
b representation and D(j) denote the number of base b digits. Then 


tk Bi) b= 1 
im —- )) ——~ = —_. 
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Mathematics is not a deductive science—that’s a cliche. When you 
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The Angle Between 
Complementary Subspaces 


Ilse C. F. Ipsen and Carl D. Meyer 


1, INTRODUCTION. Almost all linear algebra courses discuss angles between 
vectors. The angle between two nonzero vectors u and v in {t” is defined as the 
number 0 < 6 < 7/2 that satisfies 


cos 6 = v'u/|lv|lzllull2. 


Usually the discussion stops right there, and extensions to angles between sub- 
spaces of higher dimensions are, more or less tacitly, shoved under the rug. 
Perhaps this is because most instructors feel that such extensions are difficult to 
understand, or that further effort in this direction is not worthwhile. Indeed, this 
makes sense for angles between general subspaces because one would have to 
introduce concepts like gap or distance between subspaces [7, 12], principal (or 
canonical) angles [1, 2, 15, 12], the CS decomposition [11, 4, 10, 6, 12], and so on. 
These topics are better off in a more advanced course. 

However, angles between complementary subspaces are easier to deal with. The 
purpose of our article is to draw attention to some simple, though not very well 
known, expressions for the angle between complementary subspaces which are 
easily derived from the fundamental theorem of linear algebra [14] and elementary 
facts about matrix norms and projectors. 

Angles between complementary subspaces are not just academic. They arise, for 
instance, in the context of controller robustness [9, 16]. Roughly speaking, the 
spaces associated with the controller and the plant (a system described by a set of 
differential equations) are complementary subspaces. The robustness of the con- 
troller is defined by the smallest perturbation that renders the system unstable, 
which means that the associated subspaces are no longer complementary. The 
system remains stable as long as perturbations are smaller than the distance 
between the complementary subspaces. One measure of distance is the sine of the 
angle between the spaces. 


2. WHICH ANGLE? Before proving any theorems, we need to be precise about 
which angle we are talking about. As the dimension grows beyond n > 2, so does 
the wiggle room in 3”, and there are a host of different angles which can be 
defined between a pair of general subspaces. But since we wish to eventually 
concentrate on complementary spaces, the concept of the minimal angle is the 
most natural one to focus on. 
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Definition 2.1. For nonzero subspaces &,WV < Xt", the minimal angle between & and 
WM is defined to be the number 0 < 0 < 7/2 that satisfies 
cos@= max v’u. (2.1) 
ucH#,vEY 
llullo=Ilvllo=1 


Notice that 6 = 0 if and only if ZF AW # 0, and 6 = 7/2 if and only if Z LY. 


While (2.1) serves to define 6, it is not easy to use—especially if one wants to 
compute the value of @ for a given pair of subspaces. The trick in making 6 more 
accessible is to first think in terms of projections, and then to shift the emphasis to 
sin 6 = (1 — cos” 6)'/””. 

The development also requires some elementary facts concerning the standard 
matrix 2-norm defined by 

|All, = max |/Ax|]|, for A € #"*%” and xe K"™!, 


Ixll2=1 


The following properties can be found (often as exercises) in standard texts. 


JA‘ lle = IIAll2 (2.2) 
[XAY||, = ||Al|, when X has orthonormal columns and Y has orthonormal rows 
(2.3) 
|All, = max y"Ax (2.4) 

Ixll2<1 

llylla<1 

1 

|All) = ———————— when A“° exists (2.5) 


min —1 
ix; = AT ile 


| 5 ° | | = max{|lAll2, ||Bll}. (2.6) 


The first step in unraveling (2.1) is to express cos 6 in terms of the orthogonal 
projectors onto # and % 


Theorem 2.1. If Pz and P, are the orthogonal projectors onto & and %, respectively, 
then 


cos 6 = |IP,Pall = llPaPylo. (2.7) 


Proof: For vectors x and y such that ||xll. = |lyll,= 1, we have P,x €©€& and 
P,y €©¥% where ||Pyxll2 < ||Pelloiixll, < 1 and ||P,yll2 < ||Pyllollyll, < 1, so that 
(2.4) can be used to write 


cosé= max viu= max  v’u= max y’P,P,x = ||P,Poall. 
ucA, vel” uc#, vel” I[xll2<1 
llull2= Ilvila=1 llulla<1, Ilvll2< lyllo<1 


The fact that ||P,-Pall2 = ||P.»P,/ll2 is a consequence of the symmetry of orthogonal 
projectors together with (2.2). a 


Theorem 2.1 does not depend on & and.Y being complementary subspaces—it 
is a statement about the minimal angle between any two subspaces of 3t”. But in 
the special case when # and Y are complementary, there is a more natural 
projector which gives rise to a formula which is simpler than (2.7). 
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3. ENTER THE OBLIQUE PROJECTOR. 


Definition 3.1. Subspaces &2,V”CR" are said to be complementary whenever 
Bt+NV=NK" and #€AWV=0, and this is denoted by writing # OPV = K". The 
associated oblique projector is the unique idempotent matrix P whose range is #& and 
whose nullspace is 4. As an operator, P projects vectors in %t" onto & along (or 
parallel to) Y, and thus it acts as the identity on & and the zero operator on %. 


The goal is to simplify (2.7) in the case of complementary spaces by somehow 
using the more natural oblique projector P instead of the two orthogonal projec- 
tors P, and P,. But to realize a simplification, we must shift the emphasis to sin 6 
rather than cos 0. 


Theorem 3.1. Suppose that 2,W Cc 8" are nonzero complementary spaces, and let P 
be the oblique projector onto & along 4. The minimal angle 0 between & and W 
satisfies 

1 


sin @=-——-. 3.1 
Pls (3-1) 


Proof: Decompose P in terms of its four fundamental subspaces by choosing 
orthogonal matrices U = (U, |U,) and V = (V, |V,) in which the columns of U, and 
U, constitute orthonormal bases for & and &~ , respectively, and V, and V, are 
orthonormal bases for. “+ and .% respectively, so that U/ U, = I and V;' V, = I for 
i = 1,2, and 


P, = UU, 1-P,=U,Uz, P,=V,V;, I-P,=V Vy. 


The matrices U and V decompose P in the sense that 


C 0 ; _y7/C 0 
0 | or, equivalently, P = uf 0 0 
in which C = U/ PV, is nonsingular. (For instance, one can choose U and V so that 
this is the singular value decomposition of P.) Notice that P? = P implies C = 
CV, U,C, which in turn insures C~! = V/ U,. Consequently, (2.3) together with 
(2.5) implies that 


UT PV = ( |v? =U,CV/ (3.2) 


1 


min ||C~'x}}, min {V7 U,xll, 
IIxllo=1 x||2=1 


Pll. = [ICll, = 


Combining this with the result of Theorem 2.1 produces 
sin? @ = 1 — cos? 6 = 1 — ||P,Pallt = 1 — IIV,V2 U,UT 3 
2 2 
=1-|(1 - v,v7)u,]]; =1- ymax (1 — vv" )u,x]|; 
1= 


=1— max x’U/(I — V,V/)U,;x =1— max (1 — ||V/U,x!l2) 
IIxll2=1 Ixll2=1 


1 


Ixllo= 


1 
Pla 
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The expression sin 6 = 1/||P||. is not only conceptually simple, but, as illus- 
trated in Figure 1, there is also a particularly nice picture that accompanies it. The 
image of the unit sphere in 9t° under P is obtained by projecting all vectors on the 
sphere onto & along lines parallel to.” The result is an ellipse in &. 


vi] = max [Pxl| = IPI” 


Figure 1 


The norm of a longest vector v on this ellipse equals the norm of P, i.e. 


lIvllo = max ||Pxll2 = ||Pli2. 
IIxll2=1 


It is apparent from the right triangle in Figure 1 that 


nO= IIxll 1 1 
sin 6 = —- = —_ = 
vio livil2 Pla 


4, BACK TO ORTHOGONAL PROJECTORS. For subspaces %,¥%C 2” such 
that dim & = dim, the difference P, — P,. of the associated orthogonal projec- 
tors is of special interest because ||P, — Pll, is acommon measure of the distance 
or separation between & and ™% It is therefore natural to inquire about what can 
be said about the minimal angle between complementary spaces in terms of the 
difference P, — P,. The following theorem provides some answers. 


Theorem 4.1. For nonzero subspaces 2,V < XK", let Pz and P, denote the orthogo- 
nal projectors onto & and 4, respectively, and let 6 be the minimal angle between & 
and W. The following two statements are true. 


¢ # and W are complementary spaces if and only if Pz — Py is nonsingular. (4.1) 
¢ If & and. are complementary spaces, then sin 6 = 1/|\(Pg — Py)" ll. — (4.2) 
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Proof of (4.1): The orthogonal matrices U and V which were introduced in the 
proof of Theorem 3.1 to decompose P also decompose P, — P, in the sense that 


Ur 
U" (Py ~ P,)V = Ur (U,U; ~~ ViVi )(V, lV, ) 
2 
(4.3) 
U’V, 0 
{0 -Urvy,} 


Assume first that & and WY are nonzero complementary subspaces. If dim # = r, 
then U/ V, is r Xr and U/ V, isn —r Xn -—r, so P, — P, is nonsingular if and 
only if U7 V, and U V, are each nonsingular. But we already know from the proof 
of Theorem 3.1 that U/ V, = (C~')’ is nonsingular, so we only need to prove that 
U! V, is nonsingular. If P is the oblique projector onto # along % then 
PU, =U, and PV,=0, 

so that 

Vi (I — P)U,U; V, = V/ (1 — P)(I — UU )V, = V/V, = 1. 
Thus Uj V, is nonsingular with (Uj V,)~' = V/(I — P)U,, and consequently P, — 
P, is nonsingular. Conversely, if P, — P,- is nonsingular, and if dim #2 =r > 0 and 
dim Y= k > 0, then U/ V, is rX n —k and U; V, is n —r Xk, so (4.3) insures 
that the rows as well as the columns in each of these products must be linearly 
independent. In other words, U/ V, and U? V, must both be square and nonsingu- 
lar, so k =n —r. Let 


Q=U(V/U,) Vy, 
and notice that Q = Q’, so that Q is a projector. If R(*) and N(*) denote range 
and nullspace, respectively, then 
R(Q).C R(U,) = R(QU,) © R(Q) = R(Q) =, 
and 
N(Q) 2N(V/) =/% 
dim N(Q) =n — dim R(Q) =n -r=k=dimyv 
In other words, Q = P is the oblique projector onto #& along ™ Therefore, since 


the range and nullspace of any projector are complementary spaces, it must be the 
case that ZF OV= Kt", a 


| => N(Q) =” 


Proof of (4.2): If # and Y are complementary, then P, — P, is nonsingular, and 
(4.3) together with (2.6) can be used to conclude that 


|e - P= max{ (UF Vv," uPva) I}. (4.4) 
But ||(U/ V,)~ "ll. = ||}(UZ V,)~ ‘ll, because we can again use (2.5) to write 
1 


| UTV | Ixlla=1 Ixll2=1 
( 1 1) 2 

IIxllo=1 
min (1 — x’ V,U,U; V,x) 
IIxll2=1 


1— max |/UZ V,xIl3 = 1 — ||UZ V,113, 


Ixll2=1 
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and a similar argument proves that 


1 
Towne 1 — |JUF V, II. 
(uly) "I, 
Therefore, the results of Theorem 3.1 insure that 
_ 1 
le@2- Pe)" =Murv,) "|, =e = llc =IPk-=—=. 


Theorem 3.1 is not new—Gohberg and Krein [5] attribute it to Ljance [8]—but 
it seems to have escaped the notice of many writers and teachers of linear algebra. 
We have not seen Theorem 4.1 in the literature. 


5. CONSEQUENCES. Although the following facts about projectors are often 
proved by separate (and sometimes substantial) arguments, they turn out to be 
immediate consequences of Theorem 3.1 and Theorem 4.1. 


Corollary 5.1. ||P||, > 1 for every non-zero projector P, and |\|P\|. = 1 if and only if 
P is an orthogonal projector. 


Corollary 5.2. ||I — P|], = ||Pll2 for all projectors P that are not zero and not equal to 
the identity. 


Corollary 5.3. Let u and v be vectors in Kt” with v'u = 1. If @ is the minimal angle 
between u and v~ (the space orthogonal to v), then 


I — uv" }l. = |luv7 Il, = —— = lulhllvilo. 
sin 6 


Proof: The first equality follows from Corollary 5.2 and the second one from 
Theorem 3.1. The fact that |luv’||, = |lullailvll2 follows from properties of the 
two-norm because 


luv" v|l2 luv? x|l, 


Hlullallvil, = = |luv“ll2 < |lullallvlle. a 


———— < max 
IIvll. x #0 IIx], 


Corollary 5.4. Jf 6 is the minimal angle between complementary spaces #,N4 < Kt", 
and if 9+ is the minimal angle between #~ and WV* , then 0= 67. 


Proof: This follows from Theorem 3.1 together with Corollary 5.2. The result is 
also a corollary of Theorem 4.1 because 


(Pe: —Pyi) I, =|(a -~P,) -(Ud- P,)) ‘|, =|(P. _ P,)'|l,. 


Corollary 5.5. For complementary spaces 2,4 < ht", let P be the oblique projector 
onto & along %, and let Q denote the oblique projector onto #~ along vV~. If Py 
and P,. are the orthogonal projectors onto & and MW, respectively, and if 6 is the 
minimal angle between & and N, then each of the following statements is true. 


-(P,—P,)!=P-—Q 
1 

IP — Qll. 

e |[P — Qll2 = [Pll 


° sin 6 = 
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Proof: The first equation can be derived from (4.3), or it can be verified by direct 
multiplication. The second and third equations follow from the first in conjunction 
with the results of Theorems 3.1 and 4.1. | 


Corollary 5.6. For complementary spaces 2,4 CK", the oblique projector P onto 
& along / is given by the pseudoinverse of P,1Pz where Pz and P,. are the 
orthogonal projectors onto # and W~ , respectively. That is 


‘ 
P = (PB, Ps) . 
Furthermore, if 6 is the minimal angle between & and .V* , then 


cos @ = ||P*ll2. 
Proof: To obtain the first equality, use (3.2) together with C~' = V,U, to write 


—1 
pi = v( “ ou" = V,C“!U? = VV UU! =P, Py. 


Now take the pseudoinverse of both sides (see [3] for details concerning pseudoin- 
verses). The second equality is a consequence of the first in conjunction with 
Theorem 2.1. = 


ACKNOWLEDGMENTS. We thank Steve Campbell for insightful discussions as well as the referee for 
suggestions that improved the exposition of our paper. 
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The Four-Vertex Theorem 
Revisited—Two Variations on the 
Old Theme 


Serge Tabachnikov 


The classical four-vertex theorem states that a closed imbedded smooth plane 
curve has at least four vertices; a vertex is an extremum of curvature. There are 
many proofs of this theorem—see, e.g. [B-G, O] and the references therein. In 
some recent works the four-vertex theorem was considered and generalized from 
the viewpoint of symplectic topology and Sturm theory—see [A1-4, T, G-M-O, 
O-T]. We present here two results inspired by the four-vertex theorem. 

With a smooth plane curve y another curve I, called its caustic (or evolute) is 
associated: [ is the envelope of the family of normal lines to y. Generically the 
curvature extrema of y are simple maxima or minima (so that the second 
derivative of curvature does not vanish at its critical points). Assume that y is in 
general position in this sense. Then [I is a smooth front, that is a singular curve 
such that at each point the tangent line is well defined—see Fig. 1 (in more 
technical terms, a front is the projection of a smooth Legendrian curve in the 
contact manifold of contact elements of the plane to this plane; the general 
position condition means that the Legendrian curve has only simple tangency with 
the fibers of the projection—see [A2] for details). Singularities of correspond to 
vertices of y; generically they are cusps shown in the figure. The four-vertex 
theorem states that the caustic of a closed imbedded curve has at least four cusps. 


Figure 1 


Parametrize the curve y by length parameter; we write y(t) and use primes to 
indicate the derivative with respect to t. Then the normal line to y at point y(t) is 
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generated by the acceleration vector y”(t). We formulate the four-vertex theorem 
once again: the envelope of the family of lines through points y(t) in the directions 
of y”(t) has at least four cusps. This is the statement we generalize in our first 
theorem. 

Consider a smooth plane curve y. Call its parametrization (not necessarily by 
length) definite if the acceleration vector revolves all the time in the same sense; 
analytically: y(t) A y’(t) # 0 for all t, where A denotes the determinant of two 
vectors. Let [ be the envelope of the family of lines /(t) through points y(t) in the 
direction of y’(t). As before T is a front; the points of y corresponding to its cusps 
will be referred to as (generalized) vertices. 


Theorem 1. A generic convex closed smooth curve with a definite parametrization has 
at least four generalized vertices. 


Proof: To start with we claim that through every point x in the plane at least one 
(actually two) of the lines /(t) passes. Indeed, consider the function f(t) = 
(y(t) —x) A y'(t). This function on the circle has a maximum at some point fy. 
Then 0 = f'(ty) = (y(t) —x) A y"(to), that is the vectors y(t)) —x and y"(t,) 
are collinear. Thus x € /(t,). Since the lines /(t) are tangent to [ we conclude that 
from each point in the plane there exists a tangent line to I. 

Next consider the front [. It is oriented by the vectors y’(t). Since the 
parametrization of y is definite, the tangent direction to [ revolves in the same 
sense all the time, and its total turn is 277. That is, the Gauss map of I is 
one-to-one. If [ has no cusps then it is a closed convex curve, and there are no 
tangent lines to [ from points inside it. This contradicts the previous paragraph. 

Alternatively, a somewhat messy computation, which we omit, shows that 
vertices of y are critical points of the function g(t) = y(t) A y"(t)/(y"@) A 
y'(t))* Gf t is the arc length parameter then g(t) is the negative of the curvature). 
This function on the circle has at least one maximum and one minimum, hence y 
has at least two vertices. 

Finally we want to show that g(t) has at least two local maxima and two local 
minima. Suppose not; then [ has only two cusps. Consider a locally constant 
function (x) in the complement of [ whose value at point x equals the number 
of tangent lines to [ through x. The value of this function increases by 2 as x 
crosses I’ from the locally concave to the locally convex side—see Fig. 2. Let x be 
sufficiently far away from I’. Since the Gauss map is one-to-one, (x) = 2 (indeed 
there exist exactly two tangent lines to [ from every point of the circle at infinity; 
by continuity the same holds for sufficiently distant points x). 


d(x) =n+2 


So Ween 


Figure 2 


Consider the line through two cusps of I (which well may coincide); assume it is 
horizontal—see Fig. 3. Then the height function restricted to [ attains either 
minimum or maximum (or both) not in a cusp. Assume it is maximum; draw the 
horizontal line / through it. Since [' lies below this line, ¢ = 2 above it. Therefore 
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Figure 3 


(x) = 0 immediately below /, and there are no tangent lines to T from x. This 
contradicts the first paragraph of the proof. Q.E.D. 

We pose some questions. First, is the assumption that the parametrization is 
definite really needed? If it fails, the front [’ will have branches that go to infinity. 

Secondly, can one show that the function g(t) has “many” critical points 
analytically? An interesting example is the affine parametrization characterized by 
y(t) A y'(t) = 1 for all ¢ (it is not necessarily definite). In this case there exist at 
least six affine vertices—see [G-M-O] for a modern treatment. 

Thirdly, call a generalized diameter of y(t) a chord which is collinear with the 
acceleration vectors y’(t) at both end points. Said differently, a generalized 
diameter is a double tangent line of [. Does y always have at least 2 diameters? It 
is the case for the length parametrization. 

Now we proceed to the second theorem. Given a closed plane curve y we are 
interested in the following tripod configurations: three perpendiculars to y dropped 
from one point that make angles of 277/3—see Fig. 4. 


Theorem 2. For any smooth convex closed curve there exist at least two tripod 
configurations. 


Figure 4 
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Proof: Let y(t) be a length parametrization, and let a(t) be the angle made by 
y'(t) with a fixed direction. Because of convexity we may (and will) use @ as a 
parameter on y. The curvature of y is a’(t) = ly"(d)|. 

Choose an origin O inside y and let p(t) be the signed distance from O to the 
line [(t), ie. p= yA y"/ly"|. Let gq = yA y’; we consider it as a function of a. 
One has: 


da dtda ly 


Now consider the function g(a — 27/3) + g(a) + q(a + 27/3). It has a mini- 
mum and a maximum on the circle, say at points a, and a,. Since p is the 
derivative of g we have: 


p( a; — 27/3) + p(a;) + pCa; + 27/3) =0, 1 = 1,2. 


Consider three lines (a — 27/3), (a) and (a + 27/3). They make an 
equilateral triangle; let a be the length of its side and A its area. Then 


a( p(a — 27/3) + p(a) + p(at 27/3)) =2A 


—see Fig. 5 (it does not matter whether the origin lies inside or outside the 
triangle). For a,, a, the left-hand side vanishes and the triangle degenerates to a 
point. We obtain two tripods. Q.E.D. 

The reader may remember a version of the tripod theorem from his school 
years: there exists a point inside a triangle, whose angles are less than 27/3, from 
which all sides are seen at angles 27/3. This point minimizes the sum of distances 
to the vertices. 

We conclude with another question: can the convexity assumption in the tripod 
theorem be relaxed? Does it hold for self-intersecting curves? 


Figure 5 
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Entire Functions Which Vanish at 
Infinity 


R. B. Burckel 


In a recent article in this journal David Armitage [1] showed how the pole-shoving 
technique used in 1885 by Carl Runge, and in most textbook accounts today of 
Runge’s approximation theorem, can be exploited to prove a non-uniqueness 
theorem for the Radon transform. The idea is to push poles out to infinity through 
the region S between two confocal parabolas, while keeping the functions small 
outside S. Since every straight line spends only a compact amount of time in S, the 
limit entire function g vanishes at infinity along every line, although it is not 
identically 0. This is quite startling in view of the fact (maximum modulus 
principle) that the vanishing condition lim, _, ,., sup,,,., |F(z)| = 0 forces an entire 
function F to be 0. If one chooses a “transcendental swath” S:= {x + iy: 
x = 1,ae* <y < be*} (0 <a <b), then the non-zero entire function g produced 
even vanishes at infinity along every unbounded algebraic curve, since every such 
curve must escape from this §. This clever idea goes back to Harold Bohr [2]. 
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I’d like to show the reader how slight modifications in Armitage’s development 
prove this dramatic result of Bohr. More results of this flavor can be found in 
Burckel and Saeki [3], which also contains a more extensive bibliography. 

First of all, we can take over Armitage’s pole-shoving lemma without change: 


Lemma 1. Suppose z,,Z, = C, 0 <|z, — z,| < 1, ¢, is holomorphic in C \ {z,} and 
€ > 0. Then there exists a holomorphic function , in C \ {z,} which satisfies 


Jo,(z) ~ bo(z)| < €(1 + 1zl)~? whenever |z ~ z,| > 1, 


Given 0<a<b< +%, write 
S=S,,:= {(x,y) © R*:x 2 1,ae* <y < be*} 


and consider any non-zero polynomial P(X,Y) € C[X,Y]. Then we have the 
following elementary 


Lemma 2. The set §S \ P~'(0) is bounded. 


Proof: There are polynomials Q,(X),...,Q,(X) © C[LX] such that 
P(X,Y) = Q,(X) + Q,-1(X)¥ + +Qo(X)Y". 


We may assume that Q)(X) is not 0 and write Q(X) = LN yc; X"7 with cy # 0. 
Consider (x, y) € § 0 P~'(0). Then x > 1 and y > 0, and 


P(x, y) n O(x) 
0= Cielya = Lr + 


whence 


n (x N 
dc yo LAO > () 


j=l cae")? j=1 !€o 
since x > 1 and y > ae*. Since lim, _,,,, x“e~* = 0 for every k € N, the right-hand 
side of inequality (*) is a bounded function of x € [1, +. If B is a finite bound, 
then (+) shows that |x| < B, and then ly| < be* < be’. 
The Construction. Special notation for the sequel is 


Z, = log(k/3) + 2ik/3 for all integers k > 81, 


231 


8si(Z) = | ) zEC\ {Zp}, 


S:= {(x,y) €R*:x2 1, e*/9 <y < 27e*}. 


The reader can easily check that 


|Z, —Z,-,)<1 Vk > 81 (1) 
and, using only the crude estimate e < 3, 
D(z,,2) = {z€C:|z—z)<2}cS Wk 8l. (2) 


Just as in Armitage [1], (1) and Lemma 1 let us inductively construct, for k > 81, 
functions g, holomorphic in C \ {z,} that satisfy 


|Z) — &e-1(z)| < 2-* (C1 + |zl)~° Vz €C\ D(z,,1). (3) 
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As there, the sequence {g,},, .; converges uniformly on each compact subset of C 
to an entire function g which is not constant and which satisfies 


lg(z)| <8lz,0lzb? Wz EC\ U D(y%,1)\ DO, 2lzg!). (4) 
k>81 
The Cauchy estimates let us infer from (4) useful bounds on the derivatives of g as 
well: 


|g(E)| < 32n!|z,I71E” WE |S C\ S\ DO, 21zg)) + 1). (5) 


Indeed, if € € C \ S, then by (2) the whole disk D(é, 1) lies in C \ U,, 3,;D(%,, )) 
and therefore (4) yields 


lg(z)| < 8lzg/7lzl? < 8lz)I’(lEl-— 1) * < 32lz,)71EF7 


for all z € D(€, 1), provided |é| > 2|z,,| + 1 (to insure that |z| > 2]z,,| and that 
|é|-— 1 = 4/&). Now apply the Cauchy estimates in the disk D(€, 1). 


Conclusion. As Armitage pointed out, the non-zero entire function g’ has the 
interesting property that 


f lg’"]|< +o and f g'=0 for every straight line L in C. 
L L 


But g also has another remarkable property. Let [':[0, +e[— C be continuous 
and satisfy lim,_,,.. |[(¢)| = +. Suppose in addition that I is algebraic, that is, it 
lies in the zero-set of some non-zero polynomial P(X, Y) € C[X,Y]. For every 
such I and every n € N U {0} 


lim g(T(t)) =0. 
t> +0 


Indeed, as t > + the modulus of the point ['(t) in P~'(0) converges to +, so 
by Lemma 1 this point leaves § U D(O, 2|z,,| + 1) never to return, and inequality 
(5) becomes valid for €é = I(t). 


Final Remark. Suppose f(t, z) is integrable in ¢ and holomorphic in z. Various 
boundedness conditions on f insure that />f(t, z) dt is holomorphic in z, and 
conclusions of this type are indispensable in analysis. However, without some 
supplemental conditions the integral may fail to be holomorphic. The entire 
function g above was used by Hayman [4] to construct an example of this: 
o¢ ‘g(tz) dt is not holomorphic in any neighborhood of 0. 
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A Converse to Cauchy’s Inequality 


D. Zagier 


Denote by Yt the set of monotone decreasing functions f: [0, ©) — [0, 1] for which 
I(f) = fof(x) dx converges. For f, g € Mt the scalar product (f, g) = (fg) con- 
verges, and the Cauchy-Schwarz inequality and the inequalities 0 < f(x) < 1 imply 
the estimate 


(f,g) smin(I(f),1(g), (ff) (88)’) (fgem. (Yd 


An inequality for (f, g) in terms of the same data but in the other direction, 
namely 

(f, f)(8>8) 
max(I(f),1(g)) 
was proved in an earlier article with the same title (up to translation) by a trick 


involving a quadruple integral [2]. We give here a more general result with a much 
simpler proof. 


(f,g) 2 (f,g <M), (2) 


Theorem. Let f and g be monotone decreasing nonnegative functions on [0,%). Then 


(f, F)(g,G) 
(f,g) = max(I(F), 1(G)) (3) 


for any integrable (but not necessarily monotone) functions F,G: [0,%) — [0, 1]. 


Proof: For all x = 0 we have 
(fF) =F f(x) + f LA) ~ FD) F(O at 


<I(F)f(x) + f [A(t) - f()] at 
and hence, since {¢G(t) dt is bounded from above by both x and J(G), 


(fF) ['G(t) des IF) xf(x) +(G) [LA ~ F(2)] at 


< max(I(F), (G)) fA) dt. 


Now multiply by —dg(x) and integrate by parts from 0 to x. The left-hand side 
gives (f, F)(g, G), the right-hand side gives max(/(F), I(G))(f, g), and the inequal- 
ity remains true because the measure —dg(x) is nonnegative. = 


Remarks. 1. Another proof of (3) can be obtained as follows. It is geometrically 
clear (and easily proved) that for f monotone decreasing the largest value of (f, F) 
as F ranges over integrable functions [0, %) — [0,1] with a given value of I(F) is 
attained by taking F to be “as far left as possible,” i.e., equal to 1 for 0 <x < I(F) 
and to 0 otherwise. Therefore the maximum of (f, F)(g,G)/A as F and G range 
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over functions [0, ) — [0, 1] with max(U/(F), I(G)) < A is equal to AUT? /A f(x) dx - 
{Ag(x) dx, and this is <(f,g) because the average of the product of two 
decreasing functions on an interval is at least equal to the product of their 
averages. 

2. A further generalization of (2) is the inequality 


W(fF)W(gG) 
W18) = Tax(W(F),W(G) 


valid for any positive linear functional W(f) = {jf(x)w(x) dx (w(x) > 0), mono- 
tone decreasing functions f and g, and functions F,G: [0, ©) — [0,1] with WCF) 
and W(G) finite. To prove it, apply (3) to the functions fov, gov, Fov and 
Gov, where v(x) = {{w(x') dx’. The case F = f, G = g is the weighted general- 
ization of (2) proved in [1]. 

3. As pointed out in [1], both bounds (1) and (2) are best possible in terms of the 
four parameters I(f), (f, f), I(g), and (g, g). The bound (2) cannot be attained 
for generic values of these parameters but can be approached arbitrarily closely by 
taking f and g to be step functions with only two non-zero values (i.e. equal to 1 
for x <X9, to C for x7 <x <x,, and to 0 for x >x,, where 0 <x) <x, and 
0 <C < 1). Such functions with given values of [(f) and (f, f) form a one-param- 
eter family (the numbers x), x, and C determine each other). If I(g) < I(f) and 
we let f move to the left (x, — 0) and g to the right (C — 0) in their respective 
families, then {> f(x)g(x) dx tends to (f, fg, g)/I(f). 

4. Monotone decreasing functions f: [0,%) — [0,1] can be interpreted as the 
integrals of probability measures (f(x) = {° dw where dy is a nonnegative mea- 
sure with integral 1). Hence (2) can be interpreted as a statement about correla- 
tions of statistical distributions. One such result, which was the original motivation 
for the inequality, is an estimate of the possible values of the “Gini coefficient” for 
a population consisting of two sub-populations, when the size, average income, and 
Gini coefficients of each of these is given [3]. (The Gini coefficient is a measure of 
the inequity of distributions of income in a large population which is used widely in 
mathematical economics.) Since the inequalities (2) and (3) are very general, they 
should have other applications, perhaps also in pure mathematics. 
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UNSOLVED PROBLEMS 
Edited by: Richard Guy & Richard Nowakowski 


In this department the MONTHLY presents easily stated unsolved problems dealing 
with notions ordinarily encountered in undergraduate mathematics. Each problem 
should be accompanied by relevant references (if any are known to the author) and by a 
brief description of known partial or related results. Typescripts should be sent to 


Richard Guy, Department of Mathematics & Statistics, The University of Calgary, 
Alberta, Canada T2N 1N4. 


Monthly Unsolved Problems, 1969-1995 


Richard K. Guy and Richard J. Nowakowski 


References in brackets are to year and page numbers of this MONTHLY, while dates 
in parentheses refer to publications listed at the end, and other items are labelled 
(tbp) if they are likely to be published formally, or as written communications (wrc) 
if publication plans are not presently known. Dates and pages in brackets are also 
appended to items in the bibliography indicating where the problem originally 
appeared in the MONTHLY. 

Klee [1970, 63] asked for the maximum length of a d-dimensional snake, where 
by snake is meant a simple circuit in the d-cube which has no chords. If we denote 
this maximum length (number of edges) by s(d), then Abbott & Katchalski (1991) 
show the s(d) > 77 x 24°. Their paper contains a very good bibliography. The 


previous best upper bound was Solov’jeva’s(1987) s(d) < 2"~'|1 — 7 J for 


1 
ao) forn = 12 


n > 7. Hunter Snevily (1994) improves this to s(d) < 2”7' (1 — > 


and conjectures that s(d) < 3:2"-° + 2 for n> 5. 

Notice that Currie [1993, 790] refers to Keranen’s (1992) solution of the problem 
mentioned by Brown [1971, 886]: there are no sequences on four symbols which 
contain no two identically equal consecutive segments. Jeffrey Shallit sends a 
bibliography of 59 items, only 12 of which are among the 25 in §E21 of the second 
(1994) edition of Unsolved Problems in Number Theory. 

Recall that an Ulam sequence [1973, 919], {a,;} = (u,v) is defined by a, = u, 
a, =v and, for n > 2, a, is the least integer expressible uniquely as the sum of two 
distinct earlier terms. Cassaigne & Finch (tbp) have proved that all Ulam se- 
quences (4,v), 5 < v = 1 mod 4 have precisely three even terms and hence are 
regular in the sense that their differences are ultimately periodic [1993, 946]. They 
prove that the asymptotic density A(v) of (4,v) — 0 as v > &, but is misbehaved 
since 

lim inf (v/2)°-A(v) =1/4 < 0.27164 <lim sup (v/2)°-A(v) 


v=1 mod 4 v=1mod 4 
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where 9 = 2 — log,3. Shiriff & Pickover (wrc) have defined a natural multi-dimen- 
sional extension of Ulam sequences in which multiplication replaces addition. 

Hatada (1994) has republished his [1986, 628] problem, together with seven 
other problems involving the n-dimensional simplex; for example, if R is the 
circumradius and r the inradius, is R > nr for n > 2? 

In connexion with Dawson’s problem [1989, 31] to find a subset of a square that 
contains disjoint connected sets A and B each containing two opposite corners, 
Leroy F. Meyers observes that his solution to MONTHLY problem E1515 [1963, 95] 
implies a solution to the opposite corners problem. He gives two ways to make 
such a separation: 


(a) The separation of the rectangle [—2,2] X [—1,1] which is easily modified to 
separate the unit square. Let 


A = {(0,0)} U {(x, y):0<|xl<1 and y =sin(7/x)} 


and let B be the complement of A in the closed rectangle. As defined here, all 
four corners belong to B. But it’s easy to extend A to contain the segments from 
(1,0) to (1,2) and from (—1,0) to (—1, —2). 


(b) For the square [0, 1] x [0, 1] let 
A = {(0,0),(1,1)} U {(x,y):0<x<1 and 0<y<1 and y/x irrational} 


and let B be the complement of A in the closed square. 

These partitions are essentially those used by Meyers in his solutions to 
problems 10328 [1993, 689] and 10341 [1993, 874]. 

For a survey of recent results on permutation polynomials [1988, 243; 1993, 71] 
see Mullen (1993). 

In (1989, 129] Clark Carroll asked for polynomials with integer roots whose 
derivatives all have integer roots. For cubics the answer is known and can be 
found, for example, in Walter (1987) or in Buddenhagen, Ford & May (1992); see 
also MONTHLY problem E3221, solved in [1989, 841-842]. For quartics with a 
repeated root there is an infinity of solutions, given essentially by the rational 
points on the elliptic curve y* = x? — 156x + 560, 57612 in Cremona (1992). 

Ralph Buchholz writes that the quintic case (3, 1,1) lie., a triple root and two 
distinct ones] remains unsolved as also the quartic case (1,1,1,1). He& Jim 
MacDougall have looked at y = x"(x — 1)(x — a) for rational a and n > 2 and 
tried to force (only) the first two derivatives to have rational roots. This led to an 
elliptic curve as in the degree 4 case and the rank for 2 <n < 10 is non-zero. 
Unfortunately they have not yet proved that the rank is always non-zero, but they 
conjecture that it is. He & Kelly (1995) have a paper on this topic. 

In [1992, 178] Connett asked if a bottle with an inside perfectly reflecting 
surface could be designed so that a beam of light shone into it was permanently 
trapped. If the light emanates from one point, or if all the rays are parallel then 
such light traps were found by R. J. MacG. Dawson, B. E. McDonald, J. Mycielski 
and L. Pachter. If the light is diffuse (e.g., on the plane, every point of a linear 
segment shines in all directions, or in 3-dimensional space, every point of a disc 
shines in all directions) then no such trap is possible. This was shown by J. 
Mycielski (wrc). 

Klaus Leeb writes that in 1970 he formulated a problem equivalent to Parker’s 
permutation problem [1993, 287, and see 1993, 948] and that it was solved, modulo 
a conjecture that turned out to be Hall’s theorem, by a student, Klaus Winkelman 
(1979). Kemperman & Ott (1994) note that Hall (1952) and Fuchs (1958) had 
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already answered the question: for what functions h:G — G on an (additive) 
abelian group G does the equation o(x)+ r(x) = h(x), x €G have as its 
solutions a pair of permutations o and 7 of G and they give explicit constructions 
in a number of cases, including that when h(x) =x and G is finite. They also 
determine the finite groups where a, t can be chosen to be automorphisms. 

Gunnar Blom is unable to claim any of James Currie’s $100 prizes [1993, 790], 
but believes that the methods of Blom & Thorburn (1982) will serve to find the 
generating function for the number of non-r-repetitive words of length n, where a 
non-r-repetitive word (r even) is one which contains no patterns of length r which 
can be broken into two identical blocks, e.g. 1111, 0101 (r = 4) or 101101, 000000, 
011011 (r = 6). 

David Callan (1995) uses Mobius inversion to settle his own problem [1994, 571] 
about the permanent of a matrix of cotangents by showing that if J is the n by n 
matrix of ones and C is the matrix with c,, = cot{((2k — 2j + 1)m/2n}, then 


n(n — 2 
2 + ve ) nnd ynnd 
n(n — 2)(n — 4) 
2°4:°6 


In [1994, 1007], and in (1993) we discussed the permutation problem that Cayley 
called ‘Mousetrap’, unaware of the parallel work of Mundfrom (1994) who corrects 
the errors in Steen’s recurrences and calculates the numbers of permutations of n 
cards with 2 as the first hit. [Recall that a shuffled deck of n cards, numbered 
1,2,...,, are counted from the top. If the card number does not agree with the 
count number, transfer the card to the bottom of the deck. If it does agree, set the 
card aside and start counting again from 1. The game is won if all cards are set 
aside, but lost if the count reaches n + 1.] 

In connexion with our ‘coin-weighing problems’ article [1995, 164], readers may 
be interested in a ‘problem of the week’ from Macalester College. This tradition 
was started 27 years ago by the late Joe Konhauser and is currently maintained by 
Stan Wagon. 

Problem #784. A Question of Imbalance. 

Five coins are identical in appearance except for labels A, B, C, D, and E. 
Each coin has a weight different from that of each of the others. Given an 
equal-arm balance, what is the minimum number of uses of the balance required 
to rank order the coins by weight? 

Of course, this is not an unsolved problem, but readers may be amused by it, 
and more persistent ones will generalize it to the ranking of n coins. Hallard Croft 
wrote, a quarter of a century ago: 


n 
per(xJ + C) =anl} x" + ae 


xno 4... 


This problem is due to Steinhaus. What is the least number of weighings, 
k(n), that will always suffice, and what strategy does one adopt in planning 
them? Steinhaus conjectures that one of these best strategies will also 
minimize the expected number of weighings necessary, where we assume that 
all original orderings are equally likely. 

On the one hand, since there are n! original possible arrangements, and 
each weighing can only sort these into 2 sets, we have, by obvious information 
theory, that 


K(n) = f[log,(n!)| + 1 ~ log,e{(n + 3) Inn — n} 


is an upper bound for k(n). 
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On the other hand, Ford & Johnson (1959) describe a very nice strategy, 
which needs only x(n) weighings, where 


k(n) = ¥ [log,zk|] = n[log,gn] — [2!'825"!73] + |Flog,6n|. 
k=1 


We now have the opposing extreme conjectures: 

(A) We can always find a ‘completely economic’ strategy, i.c., k(n) = K(n); 

(B) We cannot improve the Ford & Johnson scheme, ie., k(n) = x(n). 

The first case for which (A), (B) diverge is n = 12. Remarkably, they re- 
converge, temporarily, for n = 20, 21. Asymptotically the difference is clearly 
narrow, being sometimes as little as 0.028...n, sometimes more than 
0.110...n. 

Recent [around 1970!—Ed.] computation by M. S. Patterson at Cam- 
bridge strongly suggested that k(12) = 30, not 29, thus disproving (A). This 
has been confirmed by Wells. He further believes that k(13) = 33, which 
would disprove (B). 

There seems to be some connection between the configuration at any 
stage and the factors of the number of configurations still available. ‘Infor- 
mation theory’ rules out some strategies: for example, for n = 12, we cannot 
use twice, during the procedure, the technique for arranging 5 balls in order, 
itself 15/16 efficient, for we have ‘lost’ too much information. 

Critical cases occur where 2” is very close to, and larger than, n!, e.g. 5 is 
quite critical (120 ‘just less than’ 128). We might define ‘criticalness’ as 
{log,n!}, where {} denotes fractional part. The distribution of this is an 
interesting analytical problem. 


Knuth (1973) gives another Steinhaus reference (1959) which suggests that 
Stanistav Trybula & Czen Ping may have anticipated Ford & Johnson. 

Halbeisen & Hungerbithler (1995) discuss the general counterfeit coin problem 
and ask what is the best sequential or nonsequential strategy when more than one 
coin is counterfeit and of possibly differing weights? In particular, do sequential 
solutions, where later parts of the strategy depend on earlier weighings, always 
need less weighings than nonsequential ones? 


SUPPLEMENT 


As this column has only been appearing in alternate issues we have less to say than 
usual, and some readers may have felt starved of unsolved problems, so here are a 
few which might not have appeared had they had to sustain a whole article to 
themselves. 


Bernardo Recaman constructs the sequence a, = 1, and, forn >1,a,,, =a,/n 
or a, X n according as n divides a,, or does not, so that a, = 1, a, = 2, a, = 6, 
as = 24, a, = 120, a, = 20, ag = 140,... . He asks for an estimate of a,. Clearly 
(n — 1)! is an upper bound, and the product of the primes between n/2 and n, for 
which a good estimate is e”/*, is a lower bound. We can improve the latter by 


noting that if k < yn and p is a prime in the interval — <pé< = then 
p*|la,,, where a, k are of the same parity. In particular pla,,, for . <p< =; 


for - <p< 7 for - <p< - ..., and by a theorem of Mertens, their product is 
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~ etn? — 2", This may (sometimes) be quite a good estimate: for example, 
Ay ~ 121.7 x 27"', although these numbers have 66 decimal digits. 


John P. Robertson (1995) relates Martin LaBar’s (1984) problem of finding a 3 X 3 
magic square all of whose entries are squares to the problem of finding squares in 
arithmetic progression, to finding three rational right triangles with the same area 
and with the squares of their hypotenuses in arithmetic progression, to the 
congruent number problem [1980, 43], and hence to the family of elliptic curves 
y* =x> — n’x. Indeed, if three rational points on such a curve can be found, which 
are the doubles of other rational points and whose x-coordinates are in arithmetic 
progression, then the problem is solved. Andrew Bremner gives the following 


specimen whose diagonals fail to give the magic sum. 


15° 207 60° 
367 487-25 
52? 397 0? 


John Robertson’s father, J. S. Robertson, appears twice in last February’s Monthly, 
on p. 167, where his initials are misprinted as J. A., and on p. 173, where they 
appear as J. B. 


Judah Rominek defines the function t(n) as the number of ways of factoring n, 
where permutations are not counted as different. For example, 12 = 6-2 =4-3 
= 3-2-2 so that (12) = 4. What can be said about t(n)? Can it even be proved 
that t(n) < n? Perhaps there is something in the paper of Yang & Wen (1994), not 
yet consulted. 


REFERENCES 

Ralph H. Buchholz and Susan M. Kelly, On rational-derived quartics, Bull. Austral. Math. Soc. 51(1995) 
121-132. [1989, 129] 

Jim Buddenhagen, Charles Ford and Mike May, Nice cubic polynomials, Pythagorean triples and the 
law of cosines, Math. Mag. 65(1992) 244-249, [1989, 129] 

David Callan, On evaluating permanents and a matrix of cotangents, Linear and Multilinear Algebra, 
38(1995) 193-205. [1994, 571] 

Julien Cassaigne and Steven Finch, A class of 1-additive sequences and quadratic recurrences, 
Experimental Math., (submitted). [1973, 919] 


John E. Cremona, Algorithms for Modular Elliptic Curves, Cambridge University Press, 1992. 
Lester R. Ford and Selmer M. Johnson, A tournament problem, this MONTHLY, 66(1959) 387-389; MR 


21 #1942. [1995, 164] 
L. Fuchs, Ein kombinatorisches Problem beziiglich abelscher Gruppen, Math. Nachr., 18(1958) 
292-297. [1993, 287] 
Richard K. Guy and Richard J. Nowakowski, Mousetrap, in Combinatorics, Paul Erdos is Eighty, 
Keszthely, 1993, Bolyai Society Math. Studies, 1993, 193-205. [1994, 1007] 
Lorenz Halbeisen and Norbert Hungerbuhler, The general counterfeit coin problem, Discrete Math., 
147(1995). [1995, 164] 
Marshall Hall, A combinatorial problem on abelian groups, Proc. Amer. Math. Soc., 3(1952) 
584-587. [1993, 287] 
Kazuyuki Hatada, Problems on the n dimensional simplex, in John M. Rassias (editor) Geometry, 
Analysis and Mechanics, World Sci. Pub. Co., 1994, pp. 109-112. [1986, 628] 
J. H. B. Kemperman and Teunis J. Ott, Complementary permutations for abelian groups, Aequationes 
Math., 48(1994) 262-282. [1993, 287] 
V. Keranen, Abelian squares are avoidable on 4 letters, in W. Kuich (ed.) Automata, Languages and 
Programming, Springer Lect. Notes Comput. Sci., 623(1992) 41-52. [1971, 886] 
Donald Ervin Knuth, The Art of Computer Programming, Vol. 3 Sorting and Searching, Addison-Wesley, 
1973, esp. §5.3. [1995, 164] 


1995] UNSOLVED PROBLEMS 925 


Martin LaBar, Problem 270, Coll. Math. J., 15(1984) 69. [1995, this issue] 
Gary L. Mullen, Permutation polynomials over finite fields, in Finite Fields, Coding Theory and Advances 
in Communications and Computing, (Las Vegas NV 1991), Dekker Lect. Notes Pure Appl. Math., 


141(1993) 131-151; MR 94d:11097. [1988, 243; 1993, 71] 
Daniel J. Mundfrom, A problem in permutations: the game of ‘Mousetrap’, Europ. J. Combin., 15(1994) 
555-560. [1994, 1007] 
John P. Robertson, Magic squares of squares, Math. Mag., 68(1995) (to appear). [1995, this issue] 
Ken Shiriff and Clifford Pickover, 1-multiplicative Ulam sequences, (in preparation). [1973, 919] 
Hunter S. Snevily, The snake-in-the-box problem: A new upper bound, Discrete Math., 133(1994) 
307-314. [1970, 63] 
F. I. Solov’jeva, An upper bound for the length of a cycle in an n-dimensional cube, Diskret. Analiz., 
45(1987). [1970, 63] 
Hugo Dynoisy Steinhaus, Mathematical Snapshots, Oxford Univ. Press, New York 1950, pp. 37-40. 
[1995, 164] 
Hugo Steinhaus, Calcutta Math. Soc. Golden Jubilee Commemoration, 2(1959) 323-327. [1995, 164] 


Johann Walter, Uber ganze rationale Funktionen dritten Grades mit ganzzahligen Koeffizienten, bei 
denen Nullstellen und Extrema zugleich ganzzahlig sind, Praxis Math., 29(1987) 489-492. 


[1989, 129] 
Mark B. Wells, Proc. Information Processing Congress 65, 2(1965) 497-498. [1965, 164] 
Mark B. Wells, Elements of Combinatorial Computing, Pergamon, Oxford-New York-Toronto, 
1971. [1965, 164] 
Yang Yao-Chi and Wen Ren-Kai, A conjecture and an upper bound on the number of multiplicative 
partitions of natural numbers, Math. Appl. 7(1994) 390-397. [1995, this issue] 
Department of Mathematics Department of Mathematics 
The University of Calgary Dalhousie University 
Calgary, Alberta Halifax, Nova Scotia 


Canada T2N 1N4 Canada B3H 3J5 


The biologist can push it back to the original protist, and the chemist 
can push it back to the crystal, but none of them touch the real 
question of why or how the thing began at all. The astronomer goes 
back untold million of years and ends in gas and emptiness, and then 
the mathematician sweeps the whole cosmos into unreality and 
leaves one with mind as the only thing of which we have any 
immediate apprehension. Cogito ergo sum, ergo omnia esse videntur. 
All this bother, and we are no further than Descartes. Have you 
noticed that the astronomers and mathematicians are much the most 


cheerful people of the lot? I suppose that perpetually contemplating 
things on so vast a scale makes them fecl cither that it doesn’t matter 
a hoot anyway, or that anything so large and claborate must have 
some sense in it somewhere. 


—Dorothy L. Sayers 


With Robert Eustace. The Documents in the Case. New York: 
Harper and Row, 1930, p. 54. 
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PROBLEMS AND SOLUTIONS 


Edited by: 
Richard T. Bumby, Fred Kochman and Douglas B. West 


Proposed problems should be sent to the MONTHLY PROBLEMS address given on the 
inside front cover. Please inelude solutions and relevant references. Three copics 
of all ttems needed to evaluate the problem should be sent. 


Solutions of published problems should arrive at the MONTHLY PROBLEMS address 
given on the inside front cover before May 31, 1996, If possible, solutions should 
be typed with dauble spacing. Two copies suffice. Several solutions may be mailed 
together, but they should be on separate sheets of paper. The problem number and 
the solver’s name and mailing address should appear on each solution. A mailing 
label should be included if an acknowledgment ts desired. 


The published solution ts ltkely to be based on a solution that 1s complete and 
correct. Additional information, such as references to other appearances of the 
problem or its solution, ts also welcome. 


An asterisk (*) after the number of a problem, or part of a problem, indicates that 
no solution ts currently available. 


PROBLEMS 


10487. Proposed by William P. Wardlaw, United States Naval Academy, Annapolis, MD. 


Let R be a commutative ring with 1 and let A be an n by n matrix over R. If x = 
(x1 X2 ... X,) iS a vector with entries in R, let (x) denote the ideal generated by the entries 
of x. Show that (xA) = (x) for all n-tuples x over R if and only if A is invertible over R. 


10488. Proposed by Murray S. Klamkin, University of Alberta, Edmonton, Alberta, Canada. 


Determine the extreme values of the sum of the lengths of three concurrent and mutually 
orthogonal chords of a given sphere of radius R if the point of concurrency is at a distance 
d from the center. 


10489. Proposed by Frank Schmidt, Arlington, VA. 


Let f(n) be the number of isomorphism classes of connected graphs on n vertices whose 
automorphism group contains a Sylow 2 subgroup of the symmetric group S,. For example, 
f(3) =2 = f(4). Show that f(n) is an even number for n > 3. 
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10490. Proposed by Seung-Jin Bang, Ajou University, Suwon, Korea. 


Show that 


for all positive integers n. 


10491. Proposed by Jean-Pierre Grivaux, Lycée Chaptal, Paris, France. 


Let B be an open ball containing the origin in the Euclidean space R”, and let V denote 
its volume. B is cut into 2” parts by the coordinate hyperplanes 


Il; = {(x1,.--,%n) 1 x7 = OF 


fori = 1,...,n. Prove that at least 2”—! of these parts have volume at most V/2r-!, 


10492. Proposed by William Duke, Mathematical Sciences Research Institute, Berkeley, CA. 


Let n be a positive integer. Show that the only integral polynomials of degree less than 
n that are real and nonnegative at all n-th roots of unity and have constant term 1 are of the 
form 


Pxf 4x24 4 a ytd 
with d|n, or 


t—x? 4774 ~~ xt 


with 2d|n. 


10493. Proposed by Richard P. Stanley, Massachusetts Institute of Technology, Cambridge, 
MA, and Christophe Reutenauer, Université du Québec a Montréal, Montréal, Canada. 


Fix a positive integer k. Let f,(m, n) be the number of m-tuples a = (ao, a|,..., Gn—1) 
of integers satisfying: (a) 0 < a; < n — 1 for all i, and (b) any k circularly consecutive 
entries of a (i.€., Aj, Qj41,..., 4x1, where the subscripts are taken modulo m so that they 
lie between O and m — 1) are all distinct. Show that the generating function 


Fy(x,n) = > fa(m, n)x™ 


m> | 


is a quotient of two polynomials in x and n. 


NOTES 


ST 


(10493) Note that it does not suffice to show that F; (x, n) is arational function of x for each 
fixed n. Consider the case k = 2: fo(m,n) can be shown to be (n — 1)” + (—1)"(n — 1) 
for n > 1. From this, it follows that F2(x,n) = n(n — 1)x* / ((1+x)(1 — (n — 1)x)). 
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SOLUTIONS 


A Random Replacement Process 


10201 [1992, 163]. Proposed by Gunnar Blom, University of Lund and Lund Institute of 
Technology, Lund, Sweden. 


An urn contains one white ball and one black ball. Draw a ball at random. With 
probability 1/2 return it to the urn; otherwise (again with probability 1/2) put a ball of 
the opposite color in the urn. Perform n such drawings in succession. Find the mean and 
variance of the number X,, of white balls appearing in the n drawings. Find the limiting 
distribution of n—!/4(X, — E(X,)). 


Solution I, giving mean and variance, by Peter Griffin, California State University, Sacra- 
mento, CA. The mean and variance of X, aren/2 andn/4 — 1/2 + 1/2” respectively. The 
limiting distribution of n~!/*(X, — E(X,)) is normal, with mean 0 and variance 1 /4. 

To prove this, define three possible states for the urn immediately before the nD drawin g, 
namely So = {B, B}, Sy = {B, W}, and Sy = {W, W}. The rules for drawing and 
replacement make the sequence of successive states into a Markov chain with transition 
matrix 


5} 0 
p={i 44 
022 
By induction the n step transition matrix (for n > 0) is 
ita btn gh 
id 
toa b dtgh 


The initial distribution of states is (0, 1,0); hence the distribution at all later times is 
(1/4, 1/2, 1/4) 

Define Y, = | or 0 according to whether the n‘h ball drawn is white or black. Then 
E(X,) = >>7_, E(Y%), and E(Y;) = 1/2, using the expressions for the distribution of states. 
Thus E(X,,) = n/2 as might have been anticipated by symmetry. 

To compute Var(Xn) = Dij_, Var(¥i) + 2 0 <jcj<n Cov(Yi, Yj), observe first that 
Var(¥;) = 1/4. Also Cov(¥;, ¥;) = E(ViY;) — (1/2)? = Pr(¥i¥; = 1) — 1/4. Let S be 
the state just before the i" drawing, S’ the state just before the (i + 1)" drawing, and S” 
the state just before the j" drawing. Then 


2 
Pr¥i¥j =1)= S| Pr(¥i¥; =1,S =k, S' =1,S" =m) 
k,l,m=0 
= y_ Pr (Yj =1|S”=m)-Pr(s"=m|S' =1) 
k,l,m 


-Pr(S' =1,Y; =1|S=k)-Pr(S =k) 


where we have used an implicit independence of past events, which leads to the strong 
Markov property of this process, to write Pr(Yj = 1 | S’ =m, S’=1,S=k,Y; =1) as 
Pr(¥j = 1 | S"” =m) and Pr(S” =m | S' =1,S =k, Yj = 1) as Pr(S" =m | S’ =I). 
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The terms Pr(Y; = 1 | S” = m) and Pr(S’ = 1,Y; = 1 | S = k) are immediately 
computed from the drawing and replacement rules; the term Pr(S” = m | S’ = 1) is the 
(l,m) entry of P” forn = j —i—1if j —i > 1, otherwise the Kronecker 6,,,. 

The term Pr(S = k) is the k" entry of the state vector, so there are two cases depending 
on whether i = 1 ori > 1. Fori = 1 we find Pr(Y|¥; = 1) = 1/4 — 1/2/+!; fori > 1 
we find Pr(¥;¥; = 1) = 1/4. (In particular, for i > 1, ¥; and Y; are independent.) Thus 
finally 


Solution I, establishing asymptotic distribution, by Wolfgang J. Biihler, Johannes 
Gutenberg-Universitdt, Mainz, Germany. The asymptotics of the process are most eas- 
ily seen by observing that after each turn a white ball is returned to the urn with probability 
1/2, independently of anything else that has happened. Therefore, the number Z, of white 
balls returned to the urn after the first n drawings is the sum of a fair coin tossing pro- 
cess, to which the central limit theorem applies. Since X,, and Z,, can differ by at most 1, 
(Xn —E(Xn)) / Jn and (Z, — E(Zn)) / /n have the same asymptotic distribution, by 
lemma 2 of [2, section VIII.2]. Note that this approach shows that the initial distribution 
does not affect the asymptotic behavior of the process. 


Editorial comment. The selections above are extracted from more complete solutions that 
were submitted. For example, Peter Griffin’s method led to the use of the mixing central 
limit theorem (Theorem 27.5 of [1]) to obtain the asymptotic distribution. In particular, it 
is not necessary to invent an independent process in order to apply a weaker theorem. The 
applicability here depends on the fact that P? has all its entries positive, and an argument 
very similar to that of example 27.5 of [1]. To allow for the fact that the initial distribution 
is not the stationary distribution of the chain, follow the argument of Theorem 19.1.1 of [7]. 

Ellen Hertz analyzed the process by distinguishing the positions of the two balls, with the 
replacement always being put in the vacant position. She introduced the random variable 
T, the earliest time at which both positions have been touched. Conditioned on T, Xy, 
becomes, essentially, a sum of independent random variables. This simplifies the rest of the 
analysis. 

Generating functions provide an efficient organization of the details of Solution I. The 
transition matrices lead to linear equations relating the generating functions for the different 
states. Robin Chapman included solutions of these equations. Surya Narayana noted that a 
general method along these lines has been developed in [3]. Other references giving details 
and examples of the use of generating functions to establish asymptotic normality can be 
found in [4], [5], and [6]. 
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7. I. Ibragimov & Yu. V. Linnik, Independent and Stationary Sequences of Random Variables, Wolters-Noordhof, 1971 


Solved also by R. J. Chapman (U. K.), V. Hernandez (Spain), E. Hertz, J. H. Lindsey II, S. Narayana, M. Stamp, 
J. Vogel, and the proposer. 
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A Solitaire Army on the Line 


10287 [1993, 185]. Proposed by Dr. A. Keith Austin, The University of Sheffield, Sheffield, 
England. 


We have a doubly-infinite (i.e., indexed by Z) row of squares and we start with counters in 
those squares to the left of some point (e.g., those with negative index). For a fixed positive 
integer k, the allowable moves consist of selecting k consecutive squares, discarding one of 
the counters in those squares, and rearranging the remaining counters within the k selected 
Squares (with at most one counter in a square). Prove or disprove that there is an integer 
N = Nk) such that no sequence of moves will allow a counter to be placed N squares into 
the region which originally contained no counters. 


Solution by Ilias Kastanas, California State University, Los Angeles, CA. There is such 
an integer NV. For k = 1, 2, we observe that we can never move a counter into non-negative 
territory. For k > 2, let A > 1 be areal number satisfying the equation 


DEAHTAZ +... FAME D/?AT — IED PAT Lg kl, 


The defining equation has one more term on the left than on the right. If we delete the 
same number of terms from each side, what remains on the left exceeds what remains on the 
right. In particular, the sum of ! members of {1,4,...,4*—!} is at least )-)29 A‘, the sum 
of / — 1 members is at most ara A', and the former is at least the latter. Hence no move 
can increase )°),,.,A’”, where S is the set of positions containing counters. Originally, 
yemes ah” = v1 ym — 1/(A — 1). If N is large enough so that A” > 1/(A — 1), then 
no counter can reach position N. 


Solved also by R. J. Chapman (U. K.), B. Doran, W. Goddard & D. J. Kleitman, R. Holzsager, O. P. Lossers 
(The Netherlands), G. Myerson (Australia), B. Peterson, B. Ravikumar, A. Riese, F. Schmidt, A. N. ’t Woord (The 
Netherlands), GCHQ Problem Solving Group (U. K.), and the National Security Agency Problems Group. 


A Rediscovery of Polya 


10290 [1993, 290]. Proposed by David Allison, University of Cape Town, Rondebosch, 
South Africa. 


Let c € N. Consider the expression S,.(n) = }-7_) r°. 
(a) Show that S, (n)/S?(n) is a polynomial in S;(”) when c is odd andc > I. 
(b) Show that S.(n)/S2(n) is a polynomial in S;(n) when c is even. 


Editorial comment. As the Con Amore Problem Group observed, the statement appears 
as 3.8 and 3.9 of George Pélya’s Mathematical Discovery, Vol. 1 (Wiley, 1962). Pdlya 
gives, aS 3.7 and 3.8, formulas which give the results immediately by induction. The 
elementary solutions provided by readers used the same method as Pélya. Other solutions 
were longer, more complicated, or used more advanced results. In particular, these sums may 
be expressed in terms of values of Bernoulli polynomials, allowing the result to be extracted 
from many references. The papers of Edwards, cited below, note that work on instances of 
these identities has been traced back to ancient times, although the idea of writing the S; 
in terms of S; is credited to Johannes Faulhaber( 1580-1635). Authors mentioned below 
describe their encounter with a copy of his Academia Algebrae, published in 1631, in which 
these formulas are developed. A biography of Faulhaber by Ivo Scheider was published by 
Birkhauser in 1993. 
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Many solvers cited other sources that, if not the exact solution of the problem, provide 
formulas that gave the result quickly. A sample is given below (as provided by readers in 
parentheses). 


REFERENCES 


O. D. Anderson, “Summing powers of integers”, Mathematical Spectrum 23 (1990-91), 116-121 (S.-J. Bang). 

B. C. Berndt, Ramanujan’s Notebooks I (Springer, 1985), 157-158, (H.-J. Seiffert). 

P. Bachmann, Niedere Zahlentheorie (Leipzig 1902, 1910), Vol. 2, p. 293, (reprinted by Chelsea, 1968), (M. Vowe). 

S. Bernard and J. M. Child, Higher Algebra (Macmillan 1936), VIII.8 (A. Caicedo Nifiez). 

A. W. F. Edwards, “Sums of powers of integers”, Mathematical Gazette 66(1982), 22-28 (F. Flanigan, H. Krish- 
napriyan). 

A. W. F. Edwards, “A quick route to sums of powers”, this MONTHLY 93 (1986), 451-455, (F. Flanigan, A. Pedersen). 

D. E. Knuth, “Johann Faulhaber and sums of powers”, Math. Comp. 61 (1993), 277-294, (I. Nemes). 


Solved also by S.-J. Bang (Korea), R. Barbara (Lebanon), V. BoZin (student, Yugoslavia), A. E. Caicedo Nufiez 
(student, Colombia), R. J. Chapman (U. K.), F. J. Flanigan, J. Fukuta (Japan), R. Holzsager, F. T. Howard, L. N. Howard, 
H. Kappus (Switzerland), I. Kastanas, B. G. Klein, H. K. Krishnapriyan, S. Liu, O. P. Lossers (The Netherlands), 
A. D. Melas (Greece), I. Nemes (Austria), A. Pedersen (Denmark), F. Schmidt, H.-J. Seiffert (Germany), R. Stong, 
M. Vowe (Switzerland), A. N. ’t Woord (The Netherlands), Anchorage Math Solutions Group, Con Amore Problem 
Group (Denmark), and the proposer. 


A Large Intersection of Large Sets 


10373 [1994, 274]. Proposed by M. J. Pelling, Balliol College, Oxford, England. 


Let E, C IJ = [0, 1] be a sequence of measurable sets in the unit interval with measures 
mE, > 6 > 0 bounded away from zero. Prove that there is a subsequence (E,,) whose 
intersection has the cardinality of the continuum. 


Solution by Kenneth Schilling, University of Michigan, Flint, MI. By passing to a subse- 
quence we may assume that the sequence (x En) of characteristic functions converges in the 


weak* topology for, say, L* to the measurable function f: (0, 1] — R. In particular, then 


lim m(EnF) = | f 
n>O F 


for all measurable sets F Cc [0, 1]. 

Taking F = [0, 1], we see that I f > 6 > O; thus there exists 5’ > 0 and G C (0, 1] 

such that m(G) > Oand f > 6’ onG. Thus 
lim m(E,N F) > 6'-m(F) (*) 
neo 

for all measurable sets F C G. 

We now construct, by recursion, a family (G,) of closed subsets of G of positive measure, 
indexed by the elements s of the set Sq of finite binary sequences, and a sequence Nj, such 
that the follow properties hold for all s,s’ € Sq: (1) G; AN Gy = @ for s 4 s’ of the same 
length, (2) Gs; D Gy for s C s’, and (3) Gs C Evy, for s of length i. 

Choose No so that GM En, has positive measure, and let Gg = GM En,, where 6 denotes 
the empty sequence. Now, suppose that G, has been suitably defined for all s of length k or 
less, and N; has been defined for i < k. Givens € Sq of length k, temporarily let Ay and 
B, be disjoint subsets of G; of positive measure. Using (*), there exists Nx41 > Nx such 
that both Ey,,, M As and Ey,,, M Bs have positive measure for all s of length k. Now, for 
each s of length k, let Gso9 and Gs, be closed subsets of positive measure of Ey,,, As and 
Ewn,,, | Bs, respectively. This completes the construction. 

For every infinite binary sequence o,, the sets Go,...,, are nested, nonempty and compact, 
so the set Kz = {}7< Go,.-.o, 1s nonempty. Also, the sets K, are disjoint for distinct o, so 


i=| 
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LU, Ko has the cardinality of the continuum. Finally, Go,...6, C E n; for all o andi. Thus, 
U, Ko C (2 En, and the proof is complete. 


Editorial comment. Zbigniew Lipecki noted that stronger and more precise results may 
be found in P. Erdés, H. Kestenman, and C. A. Rogers, “An intersection property of sets 
with positive measure’’, Collog. Math. 11 (1963), 75-80. A still stronger result (implying, 
for example, that there is a subsequence whose intersection has Hausdorff dimension 1) can 
be found in P. Erdés and S. J. Taylor, “The Hausdorff measure of the intersection of sets 
of positive Lebesgue measure’, Mathematika 10 (1963), 1-9. The title of this solution is 
borrowed from an article by Paul Halmos, this MONTHLY 99 (1992), 307-312 (the author 
credited credited this article with suggesting the problem, and reference to it was given in 
a note accompanying the statement of the problem) . 


Solved also by R. B. Israel (Canada), I. Krzeminska (Poland), Z. Lipecki (Poland), A. N. ’t Woord (The Neiherlands), 
and the proposer. 


REVIVALS 


A few misprints were noted, which will be corrected here. 


Comparing Sums of Numbers with Equal Products 


6667 [1991, 766; 1994, 914]. Proposed by George Baloglou and Phil Tracy, State University 
of New York, College at Oswego. 


If aj, a2,...,n,-b1, b2,..., by are positive numbers such that 


A\d)...dy = bybo...b, and 


> laa—ajl< >> [bi — yl, 


l<i<j<n l<i<j<n 


(i) prove that 


Sia <(n- bi, and 
i=l 


i=] 
(ii) show that the factor n — 1 cannot be replaced by a smaller one. 


Editorial comment. A mistake that could only be made by a computer took the asterisk 
indicating that Problem 6080 [1976, 205; 1994, 913] had been initially published without 
a solution and placed it on part (i) of this problem. The proposers had, indeed, submitted 
solutions to both parts of this problem. The proposers’ lengthy solution (of both (i) and (ii)) 
relies on a study, for L > Oandn > 2, of }°7_, x; on 


D(L) = [ota xX; > 0, X1xX2-+-+Xy, = 1, > |x; — x;| = (n — nif 


l<i<j<n 


Denote the extreme values et x; on D(L) by max(L) and min(L). The proposers were 
able to show that max(L)/min(L) < n — 1 and other properties of these quantities es- 
sential to solving the problems. However, determination of these extreme values remains 
open. 
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